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Preface for the First Edition 


HISTORICAL NOTE 


The theory of probability is concerned with events that occur when randomness or chance 
influences the result. When the data from a sample survey or the occurrence of extreme 
weather patterns are common enough examples of situations where randomness is involved, 
we have come to presume that many models of the physical world contain elements of 
randomness as well. Scientists now commonly suppose that their models contain random 
components as well as deterministic components. Randomness, of course, does not involve 
any new physical forces; rather than measuring all the forces involved and thus predicting 
the exact outcome of an experiment, we choose to combine all these forces and call the 
result random. The study of random events is the subject of this book. 

It is impossible to chronicle the first interest in events involving randomness or chance, 
but we do know of a correspondence between Blaise Pascal and Pierre de Fermat in the mid- 
dle of the seventeenth century regarding questions arising in gambling games. Appropriate 
mathematical tools for the analysis of such situations were not available at that time, but 
interest continued among some mathematicians. For a long time, the subject was connected 
only to gambling games and its development was considerably restricted by the situations 
arising from such considerations. Mathematical techniques suitable for problems involv- 
ing randomness have produced a theory applicable to not only gambling situations but also 
more practical situations. It has not been until recent years, however, that scientists and 
engineers have become increasingly aware of the presence of random factors in their experi- 
ments and manufacturing processes and have become interested in measuring or controlling 
these factors. 

It is the realization that the statistical analysis of experimental data, based on the theory 
of probability, is of great importance to experimenters that has brought the theory to the 
forefront of applicable mathematics. The history of probability and the statistical analysis 
it makes possible illustrate a prime example of seemingly useless mathematical research 
that now has an incredibly wide range of practical application. Mathematical models for 
experimental situations now commonly involve both deterministic and random terms. It 
is perhaps a simplification to say that science, while interested in deterministic models to 
explain the physical world, now is interested as well in separating deterministic factors from 
random factors and measuring their relative importance. 

There are two facts that strike me as most remarkable about the theory of probability. 
One is the apparent contradiction that random events are in reality well behaved and that 
there are laws of probability. The outcome on one toss of a coin cannot be predicted, but 
given 10,000 tosses of the same coin, many events can be predicted with a high degree of 
accuracy. The second fact, which the reader will soon perceive, is the pervasiveness of a 
probability distribution known as the normal distribution. This distribution, which will be 
defined and discussed at some length, arises in situations which at first glance have little in 
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common: the normal distribution is an essential tool in statistical modeling and is perhaps 
the single most important concept in statistical inference. 
There are reasons for this, and it is my purpose to explain these in this book. 


ABOUT THE TEXT 


From the author’s perspective, the characteristics of this text which most clearly differenti- 
ate it from others currently available include the following: 


e Applications to a variety of scientific fields, including engineering, appear in every 
chapter. 


e Integration of computer algebra systems such as Mathematica provides insight into 
both the structure and results of problems in probability. 


e A great variety of problems at varying levels of difficulty provides a desirable 
flexibility in assignments. 


e ‘Topics in statistics appear throughout the text so that professors can include or omit 
these as the nature of their course warrants. 


e Some problems are structured and solved using recursions since computers and 
computer algebra systems facilitate this. 


e Significant and practical topics in quality control and quality production are 
introduced. 


It has been my purpose to write a book that is readable by students who have some 
background in multivariable calculus. Mathematical ideas are often easily understood until 
one sees formal definitions that frequently obscure such understanding. Examples allow us 
to explore ideas without the burden of language. Therefore, I often begin with examples 
and follow with the ideas motivated first by them; this is quite purposeful on my part, since 
language often obstructs understanding of otherwise simply perceived notions. 

I have attempted to give examples that are interesting and often practical in order to 
show the widespread applicability of the subject. I have sometimes sacrificed exact mathe- 
matical precision for the sake of readability; readers who seek a more advanced explication 
of the subject will have no trouble in finding suitable sources. I have proceeded in the belief 
that beginning students want most to know what the subject encompasses and for what it 
may be useful. More theoretical courses may then be chosen as time and opportunity allow. 
For those interested, the bibliography contains a number of current references. 

An author has considerable control over the reader by selecting the material, its order 
of presentation, and the explication. I am hopeful that I have executed these duties with due 
regard for the reader. While the author may not be described with any sort of precision as 
the holder of a tightrope, I have been guided by the admonition: “It’s not healthy for the 
tightrope walker to be misunderstood by the person who’s holding the rope.”! 

The book makes free use of the now widely available computer algebra systems. I have 
used Mathematica, Maple, and Derive for various problems and examples in the book, and 
I hope the reader has access to one of these marvelous mathematical aids. These systems 
allow us the incredible opportunity to see graphs and surfaces easily, which otherwise would 
be very difficult and time-consuming to produce. Computer algebra systems make some 
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parts of mathematics visual and thereby add immensely to our understanding. Derivatives, 
integrals, series expansions, numerical computation, and the solution of recursions are used 
throughout the book, but the reader will find that only the results are included: in my opin- 
ion there is no longer any reason to dwell on calculation of either a numeric or algebraic 
sort. We can now concentrate on the meaning of the results without being restrained by the 
often mechanical effort in achieving them; hence our concentration is on the structure of 
the problem and the insight the solution gives. Graphs are freely drawn and, when appro- 
priate, a geometric view of the problem is given so that the solution and the problem can 
be visualized. Numerical approximations are given when exact solutions are not feasible. 
The reader without a computer algebra system can still do the problems; the reader with 
such a system can reproduce every graph in the book exactly as it appears. I have included 
a fairly expensive appendix in which computer commands in Mathematica are given for 
many of the examples in which Mathematica was used; this should also ease the translation 
to other computer algebra systems. The reader with access to a computer algebra system 
should refer to Appendix | fairly frequently. 

Although I hope the book is readable and as completely explanatory as a probability 
text may be, I know that students often do not read the text, but proceed directly to the 
problems. There is nothing wrong with this; after all, if the ability to solve practical prob- 
lems is the goal, then the student who can do this without reading the text is to be admired. 
Readers are warned, however, that probability problems are rarely repetitive; the solution 
of one problem does not necessarily give even any sort of hint as to the solution of the next 
problem. I have included over 840 problems so that a reader who solves the problems can 
be reasonably assured that the concepts involving them are understood. 

The problem sections begin with the easiest problems and gradually work their way 
up to some reasonably difficult problems while remaining within the scope and level of the 
book. In discussing a forthcoming examination with my students, I summarize the material 
and give some suggestions for practice problems, so I have followed each chapter by a 
Chapter Summary, some suggestions for Review Problems, and finally some Supplemen- 
tary Problems. 


FOR THE INSTRUCTOR 


Texts on probability often use generating functions and recursions in the solution of many 
complex problems; with our use of computer algebra systems, we can determine generating 
functions, and often their power series expansions, with ease. The structure of generating 
functions is also used to explain limiting behavior in many situations. Many interesting 
problems can be best described in terms of recursions; since computer algebra systems 
allow us to solve such recursions, some discussion of recursive functions is given. Proofs are 
often given using recursions, a novel feature of the book. Occasionally, the more traditional 
proofs are given in the exercises. 

Although numerous applications of the theory are given in the text and in the problems, 
the text by no means exhausts the applications of the theory of probability. In addition to 
solving many practical and varied problems, the theory of probability also provides the 
basis for the theory of statistical inference and the analysis of data. Statistical analysis is 
combined with the theory of probability throughout the book. Hypothesis testing, confi- 
dence intervals, acceptance sampling, and control charts are considered at various points in 
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the text. The order in which these topics are to be considered is entirely up to the instructor; 
the book is quite flexible in allowing sections to be skipped, or delayed, resulting in rear- 
rangement of the material. This book will serve as a first introduction to statistics, but the 
reader who intends to apply statistics should also elect a course in applied statistics. In my 
opinion, statistics will be the centerpiece of applied mathematics in the twenty-first century. 
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I am pleased to offer a second edition of this text. The reasons for writing the book remain 
the same and are indicated in the preface for the first edition. While remaining readable and 
I hope useful for both the student and the instructor, I want to point out some differences 
between the two editions. 

e The first edition was written when Mathematica was in its fourth release; it is now 
in its ninth release and while its capabilities have grown, some of the commands, 
especially those regarding graphs, have changed. Therefore, Appendix | is totally 
new, reflecting the changes in Mathematica. 


e Both first and second editions contain about 120 graphs; these have been mostly 
redrawn. 


e The problems are of primary importance to the student. Being able to solve them 
verifies the student’s mastery of the material. The book now contains over 880 
problems, 60 or so of which are new. 


e Chapter 7, titled “Some Challenging Problems”, is new. Five problems, or sets 
of problems, some of which have been studied by famous mathematicians, are 
introduced. Open questions are given, some of which will challenge the reader. 
Problems are almost always capable of extension; the reader may do this while 
doing a project regarding one of the major problems. 


I have profited from comments from both instructors and students who used the first 
edition. In a sense I owe a debt to every student of mine at Rose—Hulman Institute of Tech- 
nology. Heartfelt Thank yous go to Sari Freedman and my editor, Susanne Steitz-Filler 
of John Wiley & Sons. Sangeetha Parthasarathy of LaserWords has been very helpful and 
patient during the production process. I have been fortunate to rely on the extensive com- 
puter skills of my nephew, Scott Carter to whom I owe a big Thank You. But I owe the 
greatest debt to my wife, Cherry, who has out up with my long hours in the study. I also 
owe a pat on the head for Ginger who allowed me to refresh while guiding me on long 
walks through our Old North End neighborhood. 


JOHN J. KINNEY 
March 4, 2014 


Colorado Springs 
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Chapter 1 


Sample Spaces and Probability 


1.1 DISCRETE SAMPLE SPACES 


Probability theory deals with situations in which there is an element of randomness or 
chance. Some models of the physical world are deterministic, that is, they predict exactly 
what will happen under certain circumstances. For example, if an object is dropped from 
a height and given no initial velocity, its distance, s, from the starting point is given by 
sai. g-t°, where g is the acceleration due to gravity and tf is the time. If one tried to 
apply the formula in a practical situation, one would not find very satisfactory results. The 
problem is that the formula applies only in a vacuum and ignores the shape of the object 
and the resistance of the air as well as other factors. Although some of these factors can be 
determined, we generally combine them and say that the result has a random or chance com- 
ponent. Our model then becomes s = a g-t? +, where € denotes the random component 
of the model. In contrast with the deterministic model, this model is stochastic. 

Science often considers stochastic models; in formulating new models, the scientist 
may try to determine the contributions of both deterministic and random components of 
the model in predicting accurate results. 

The mathematical theory of probability arose in consideration of games of chance, 
but, as the above-mentioned example shows, it is now widely used in far more practical and 
applied situations. We encounter other circumstances frequently in everyday life in which 
we presume that some random factors are at work. Here are some simple examples. What 
is the chance I will find that all eight traffic lights I pass through on my way to work are 
green? What are my chances for winning a lottery? I have a ten-volume encyclopedia that I 
have packed in separate boxes. If the boxes become mixed up and I draw the volumes out at 
random, what is the chance that my encyclopedia will be in order? My desk lamp has a bulb 
that is “guaranteed” to last 5000 hours. It has been used for 3000 hours. What is the chance 
that I must replace it before 2000 more hours are used? Each of these situations involves a 
random event whose specific outcome is unpredictable in advance. 

Probability theory has become important because of the wide variety of practical prob- 
lems it solves and its role in science. It is also the basis of the statistical analysis of data that 
is widely used in industry and in experimentation. Consider some examples. A manufac- 
turer of television sets may know that 1% of the television sets manufactured have defects 
of some kind. What is the chance that a shipment of 200 sets a dealer has received contains 
2% defective sets? Solving problems such as these has become important to manufactur- 
ers who are anxious to produce high quality products, and indeed such considerations play 
a central role in what has become known in manufacturing as statistical process control. 


Probability: An Introduction with Statistical Applications, Second Edition. John J. Kinney. 
© 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc. 


www.it-ebooks.info 


2 Chapter 1 Sample Spaces and Probability 


Sample surveys, in which only a portion of a population or reference set is investigated, 
have become commonplace. A recent survey, for example, showed that two-thirds of wel- 
fare recipients in the United States were not old enough to vote. But surely we do not know 
that exactly two-thirds of all welfare recipients were not old enough to vote; there is some 
uncertainty, largely dependent on the size of the sample investigated as well as the man- 
ner in which the survey was conducted, connected with this result. How is this uncertainty 
calculated? 

As a final example, consider a scientific investigation into say the relationship between 
temperature, a catalyst, and pressure in creating a chemical compound. A scientist can 
only carry out a few experiments in which several combinations of temperatures, amount 
of catalyst, and level of pressure are investigated. Furthermore, there is an element of 
randomness (largely due to other, unmeasured, factors) that influence the amount of com- 
pound produced. How is the scientist to determine which combination of factors maximizes 
the amount of chemical compound? We will encounter many of these examples in this 
book. 

In some situations, we could measure all the forces involved and predict the outcome 
precisely but very often choose not to do so. In the traffic light example, we could, by 
knowledge of the timing of the lights, my speed, and the traffic pattern, predict precisely 
the color of each light as I approach it. While this is possible, it is probably not worth the 
effort, so we combine all the forces involved and call the result “chance.” So “chance” as 
we use it does not imply any new or unknown physical forces; it is simply an umbrella 
under which we put forces we choose not to measure. 

How do we then measure the probability of events such as those described earlier? How 
do we determine how likely such events are? Such probability problems may be puzzling 
to us since we lack a framework in which to solve them. We lack a strategy for dealing with 
the randomness involved in these situations. A sensible way to begin is to consider all the 
possibilities that could occur. Such a list, or set, is called a sample space. 

We begin here with some situations that are admittedly much simpler than some of 
those described earlier; more complex problems will also be encountered in this book. 

We will consider situations that we call experiments. These are situations that can be 
repeated under identical circumstances. Those of interest to us will involve some random- 
ness so that the outcomes cannot be precisely predicted in advance. As examples, consider 
the following: 


e Two people are chosen at random from a group of five people. 

e Choose one of two brands of breakfast cereal at random. 

e Throw two fair dice. 

e Take an actuarial examination until it is passed for the first time. 

e Any laboratory experiment. 

Clearly, the first four of these experiments involve random factors. Laboratory experi- 
ments involve random factors as well and we would probably choose not to measure all the 
factors so as to be able to predict the exact outcome in advance. 

Once the conditions for the experiment are set, and we are assured that these 


conditions can be repeated exactly, we can form the sample space, which we define as 
follows: 


Definition A sample space is a set of all the possible outcomes from an experi- 
ment. 
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Example 1.1.1 


The sample spaces for the first four experiments mentioned above are as follows: 


First die 
- MWh OOD 


(a) (Choose two people at random from a group of five people.) Denoting the five 


people as A, B, C, D, and E, we find, if we disregard the order in which the persons 
are chosen, that there are ten possible samples of two people: 


S = {AB, AC, AD, AE, BC, BD, BE, CD, CE, DE}. 


This set, S, then comprises the sample space for the experiment. 

If we consider the choice of people as random, we might expect that each of 
these ten samples occurs about 10% of the time. Further, we see that any particular 
person, say B, occurs in exactly four of the samples, so we say the probability that 
any particular person is in the sample is 2 = =. The reader may be interested 
to show that if three people were selected fom a group of five people, then the 
probability a particular person is in the sample is =. Here, there is a pattern that we 


can establish with some results to be developed later in this chapter. 


(b) (Choose one of two brands of breakfast cereal at random.) Denote the brands as K 


and P. We take the sample space as 
S={K,P}, 


where the set S contains each of the elementary outcomes, K and P. 


(c) (Toss two fair dice.) In contrast with the first two examples, we might consider 


several different sample spaces. Suppose first that we distinguish the two dice by 
color, say one is red and the other is green. Then we could write the result of a toss 
as an ordered pair indicating the outcome on each die, giving say the result on the 
red die first and the result on the green die second. Let a sample space be 


S, ={d, D, C, 2), .... 1, 6), (2, D, (2, 2), ..., (2, 6), ..., (6, 6) }. 


It is useful to see this sample space as a geometric space as in Figure 1.1. 

Note that the 36 dots represent the only possible outcomes from the experi- 
ment. The sample space is not continuous in any sense in this case and may differ 
from our notions of a geometric space. 

We could also describe all the possible outcomes from the experiment by 
the set 

S, = {2,3,4,5,6, 7, 8,9, 10, 11, 12} 


since one of these sums must occur when the two dice are thrown. 


Second die 


12 3 4 5 6 Figure1.1 Sample space for tossing two dice. 
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(d) 


(e) 


Which sample space should be chosen? Note that each point in S, represents 
at least one point in S,. So, while we might consider each of the 36 points in 
S, to occur with equal frequency if we threw the dice a large number of times, 
we would not consider that to be true if we chose sample space S,. A sum of 
7, for example, occurs on 6 of the points in S$, while a sum of 2 occurs at only 
one point in S;. The choice of sample space is largely dependent on what sort 
of outcomes are of interest when the experiment is performed. It is not uncom- 
mon for an experiment to admit more than one sample space. We generally select 
the sample space most convenient for the analysis of the probabilities involved in 
the problem. 

We continue now with further examples of experiments involving randomness. 


(Take an actuarial examination until it is passed for the first time.) Letting P and F 
denote passing and failing the examination, respectively, we note that the sample 
space here is infinite: 


S= {P,FP,FFP,FFFP, ...}. 


However, S here is a countably infinite sample space since its elements can be 
counted in the sense that they can be placed in a one-to-one correspondence with 
the set of natural numbers {1,2,3,4, ... } as follows: 


Pel 
FP 32 
FFP 33 


The rule for the one-to-one correspondence is as follows: given an entry in the left 
column, the corresponding entry in the right column is the number of the attempt 
on which the examination is passed; given an entry in the right column, say n, 
consider n — 1F’s followed by P to construct the corresponding entry in the left 
column. Hence, the correspondence with the set of natural numbers is one-to-one. 
Such sets are called countable or denumerable. We will consider countably infinite 
sets in much the same way that we will consider finite sets. In the next chapter, we 
will encounter infinite sets that are not countable. 


Sample spaces for laboratory experiments are usually difficult to enumerate and 
may involve a combination of finite and infinite factors. 


Example 1.1.2 


As a more difficult example, consider observing single births in a hospital until two girls 
are born in a row. 

The sample space now is a bit more challenging to write down than the sample spaces 
for the situations considered in Example 1.1.1. 
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For convenience, we write the points, showing the births in order and grouped by the 
total number of births. 


Number of Sample Number of 
Births Points Sample Points 

2 GG 1 

3 BGG 1 

4 BBGG 2 
GBGG 

5 BBBGG 4 
BGBGG 
GBBGG 

6 BBBBGG 6 
BBGBGG 
BGBBGG 
GBBBGG 
GBGBGG 


and so on. We note that the number of sample points as we have grouped them follows the 
sequence 1, 1, 2,4, 6, ..., which we recognize as the beginning of the Fibonacci sequence. 
The Fibonacci sequence is found by starting with the sequence 1, 1. Subsequent entries are 
found by adding the two immediately preceding entries. However, we only have evidence 
that the Fibonacci sequence applies to a few of the groups of points in the sample space. 
We will have to establish the general pattern in this example before concluding that the 
Fibonacci sequence does indeed give the number of sample points in the sample space. The 
reader may wish to do that before reading the following paragraphs! 

Here is the reason the Fibonacci sequence occurs: consider a sequence of B’s and G’s 
in which GG occurs for the first time at the nth birth. Let a, denote the number of ways 
in which this can occur. If GG occurs for the first time on the nth birth, there are two 
possibilities for the beginning of the sequence. These possibilities are mutually exclusive, 
that is, they cannot occur together. 

One possibility is that the sequence begins with a B and is followed for the first time 
by the occurrence of GG in n — 1 births. Since we are requiring the sequence GG to occur 
for the first time at the n — Ist birth, this can occur in a,_, Ways. 

The other possibility for the beginning of the sequence is that the sequence begins 
with G, which must then be followed by B (else the pattern GG will occur in two births) 
and then the pattern GG occurs in n — 2 births. This can occur in a,_, ways. Since the 
sequence begins either with B or G, it follows that 


ay = An-1 a GAn2, 1 = 4, 
where ay = a3 = 1, (1.1) 
which describes the Fibonacci sequence. 
The sequences for which GG occurs for the first time in 7 births can then be found 


by writing B followed by the sequences for 6 births and by writing GB followed by GG in 
5 births: 
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B|BBBBGG 
B|BBGBGG 
B|BGBBGG 
B|GBBBGG 
B|GBGBGG 


GB|BBBGG 
GB|BGBGG 
GB|GBBGG 


Formulas such as ((1.1)) often describe a problem in a very succinct manner; they are 
called recursions because they describe one value of a function, here a,, in terms of other 
values of the same function; in addition, they are easily programmed. Computer algebra 
systems are especially helpful in giving large number of terms determined by recursions. 
One can find, for example, that there are 46,368 ways for the sequence GG to occur for the 
first time on the 25th birth. It is difficult to imagine determining this number without the 
use of a computer. 


EXERCISES 1.1 


1. Show the sample space when 3 people are selected from a group of 5 people. Verify 
the fact that any particular person in the selected group is 3/5. 


2. In Example 1.1.2, show all the sample points where the births of two girls in a row 
occur in 8 or 9 births. 

3. An experiment consists of drawing two numbered balls from a box of balls numbered 
from | to 9. Describe the sample space if 
(a) the first ball is not replaced before the second is drawn. 
(b) the first ball is replaced before the second is drawn. 


4. In the diagram below, A, B, and C are switches that may be closed (current flows 
through the switch) or open (current cannot flow through the switch). Show the sample 
space indicating all the possible positions of the switches in the circuit. 
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. Items being produced on an assembly line can be good (G) or not meeting specifications 


(N). Show the sample space for the next five items produced by the assembly line. 


. A student decides to take an actuarial examination until it is passed, but will attempt 


the test at most five times. Show the sample space. 


. Inthe World Series, games are played until one of the teams has won four games. Show 


all the points in the sample space in which the American League (A) wins the series 
over the National League (N) in at most six games. 


. We are interested in the sequence of male and female births in five-child families. Show 


the sample space. 


. Twelve chips numbered | through 12 are mixed in a bowl. Two chips are drawn suc- 


cessively and without replacement. Show the sample space for the experiment. 


. An assembly line is observed until items of both types—good (G) items and items not 


meeting specification (N)—are observed. Show the sample space. 


. Two numbers are chosen without replacement from the set {2,3,4,5,6,7}, with the 


additional restriction that the second number chosen must be smaller than the first. 

Describe an appropriate sample space for the experiment. 

Computer chips coming off an assembly line are marked defective (D) or nondefective 

(N). The chips are tested and their condition listed. This is continued until two consec- 

utive defectives are produced or until four chips have been tested, whichever occurs 

first. Show a sample space for the experiment. 

A coin is tossed five times and a running count of the heads and tails is kept (so the 

number of heads and the number of tails tossed so far is recorded at each toss). Show 

all the sample points where the heads count always exceeds the tails count. 

A sample space consists of all the linear arrangements of the integers 1, 2, 3, 4, and 5. 

(These linear arrangements are called permutations). 

(a) Use your computer algebra system to list all the sample points. 

(b) If the sample points are equally likely, what is the probability that the number 3 is 
in the third position? 

(c) What is the probability that none of the integers occupies its natural position? 


1.2 EVENTS; AXIOMS OF PROBABILITY 


After establishing a sample space, we are often interested in particular points, or sets of 
points, in that sample space. Consider the following examples: 


(a) An item is selected at random from a production line. We are interested in the 
selection of a good item. 


(b) Two dice are tossed. We are interested in the occurrence of a sum of 5. 
(c) Births are observed until a girl is born. We are interested in this occurring in an 
even number of births. 


Let us begin by defining an event. 


Definition An event is a subset of a sample space. 


Events then contain one or more elementary outcomes in the sample space. 
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In the earlier examples, “a good item is selected,” “the sum is 5,” and “‘an even number 
of births was observed” can be described by subsets of the appropriate sample space and 
are, therefore, events. 

We say that an event occurs if any of the elementary outcomes contained in the event 
occurs. 

We will be interested in the relative frequency with which these events occur. In 
example (a), we would most likely say, if 99% of the items produced in the production 
line are good, then a good item will be selected about 99% of the time the experiment is 
performed, but we would expect some variation from this figure. In example (b), such a 
calculation is more complex since the event “the sum of the spots showing on the dice is 
5” comprises several more elementary events. If the sample space distinguishing a red and 
a green die is 

S= {(, 1), C, 2),..., 1, 6), (2, D,..., (6, 6)}, 


then the points where the sum is 5 are 
(1, 4), (2, 3), G, 2), 4, 1). 


If the dice are fair, then each of the 36 points in S occurs about 1/36 of the time, so we 
conclude that the sum of the spots showing 5 occurs about 4 - + = + of the time. 

In example (c), observing births until a girl is born, the event “an even number of births 
is observed” is much more complex than examples (a) and (b) since there is an infinity of 
possibilities. How are we to judge the frequency of occurrence of each one? We cannot 
answer this question at this time, but we will consider it later. 

Now we consider a structure so that we can deal with such questions, as well as many 
others far more complex than those considered so far. We start with some assumptions about 
any sample space. 


Axioms of Probability 


We consider the long-range relative frequency or probability of an event in a sample space. 
If we perform an experiment 120 times and an event, A, occurs 30 times, then we say that 
the relative frequency of A is 30/120 = 1/4. In general, if in n trials an event A occurs 
n(A)times, then we say that the relative frequency of A is MA) Of course, if we perform the 
experiment another n times, we do not expect A to occur exactly the same number of times 
as before, giving another relative frequency for the event A. We do expect these variable 
ratios representing relative frequencies to settle down in some manner as n grows large. If 
A is an event, we denote this limiting relative frequency by the probability of A and denote 
this by P(A). 


Definition If A is an event, then the probability of A is 


Fe ena er a 


n>o 7 


We assume at this point that the limit exists. We will discuss this in detail in Chapter 4. 


In considering events, it is most convenient to use the language and notation of sets 
where the following notations are common: 
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The union of sets A and B is denoted by A U B where 
AUB= {x|xeA or xeB}, 


where the word “or” is used in the inclusive sense, that is, an element in both sets A and B 
is included in the union of the sets. 
The intersection of sets A and B is denoted by A nN B where 


AN B= {x|xeA and xeB}. 


We will consider the following as axiomatic or self-evident: 


(1) P(A) > 0, where A is an event, 
(2) P(S) = 1, where S is the sample space, and 


(3) IfA,,A>, ... are disjoint or mutually exclusive, that is, they have no sample points 
in common, then P(U®,A;) = yn P)- 


Axioms of probability, of course, should reflect our common intuition about the occur- 
rence of events. Since an event cannot occur with a negative relative frequency, (1) is 
evident. Since something must occur when the experiment is done and since S denotes the 
entire sample space, S must occur with relative frequency 1, hence assumption (2). Now 
suppose A and B are events with no sample points in common. We can illustrate events in a 
graphic manner by drawing a rectangle that represents all the points in S; events are subsets 
of this sample space. A diagram showing the event A, that is, the set of all elements of S 
that are in the event A, is shown in Figure 1.2. Illustrations of sets and their relationships 
with each other are called Venn diagrams. 

The event A or B consists of all points in A or in B and so its relative frequency is the 
sum of the relative frequencies of A or B. This is assumption (3). Figure 1.3 shows a Venn 
diagram illustrating the disjoint events A and B. 


A Figure 1.2 Venn diagram showing the event A_ 


Figure 1.3. Venn diagram showing disjoint 
A B events A and B 
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No further axioms will be needed in our development of probability theory. We now 
consider some consequences of these assumptions. 


1.3 PROBABILITY THEOREMS 


In the above-mentioned example (b), we considered the event that the sum was 5 when two 
dice were thrown. This event in turn comprises elementary events 


(1, 4), (2, 3), (3, 2), (4, 1) 


each of which had probability x Since the events (1, 4), (2, 3), (3, 2), and (4, 1) are dis- 

joint, axiom (3) shows that the probability of he event that the sum is 5 is ig sum of the 

probabilities of these four elementary events or 55 + 56 + 36 + 36 = 36 =}: 
Assumption (3) shows that if A is an event a ee elementary disjoint events 


A), A, 43, ..., Ay, then 


Theorem 1: 


n 


P(A) = YPCa). 
i=l 


This fact is often used in the establishment of the theorems we consider in this section. 
Although we will not do so, all of them can be explained using Theorem 1. 


What can we say about P(A U B) if A and B have sample points in common? If we find 
P(A) + P(B) 


we will have counted the points in the intersection A N B twice, as shown in Figure 1.4. So 
the intersection must be subtracted once giving 


Figure 1.4 Venn diagram showing arbitrary events A 
A B and B 


Theorem 2: 
P(A UB) = P(A) + P(B) — P(ANB). 


We call this the addition theorem (for two events). 


Example 1.3.1 


Choose a card from a well-shuffled deck of cards. Let A be the event “the selected card is 
a heart,” and let B be the event “the selected card is a face card.” Let the sample space S 
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consist of one point for each of the 52 cards. If the deck is really well shuffled, each point in 
S can be presumed to have probability 1/52. The event A contains 13 points and the event B 
contains 12 points, so P(A) = 13/52 and P(B) = 12/52. But the events A and B have three 
sample points in common, those for the King, Queen, and Jack of Hearts. The event A U B 
is then the event “the selected card is a Heart or a face card,” and its probability is 


P(AUB) = P(A) + P(B) — P(ANB) 
2b I2 22 22). 1 


~ 52° 52 52 «52 26° 


It is also easy to see by direct counting that the event “the selected card is a Heart or a 
face card” contains exactly 22 points in the sample space of 52 points. 

How can the addition theorem for two events be extended to three or more events? First, 
consider events A, B, and C in a sample space S. By adding and subtracting probabilities, 
the reader may be able to see that 


Theorem 3: 


P(AUBUC) = P(A) + P(B) + P(C) — P(AN B) 
—-P(ANC)—- P(/BNC)+P(ANBNO), 


but we offer another proof as well. This proof will be based on the fact that a correct expres- 
sion for P(A U BU C) must count each sample point in the event AU BU C once and only 
once. The Venn diagram in Figure 1.5 shows that S comprises 8 disjoint regions labeled as 

0: points outside A U BU C (1 region) 

1: points in A, B, or C alone (3 regions) 

2: points in exactly two of the events (3 regions) 


3: points in AM BN C (1 region). 


Figure 1.5 Venn diagram showing events, A, 
B, and C. 


Now we show that the right-hand side of Theorem 3 counts each point in the event 
AUBUC once and only once. By symmetry, we can consider only four cases: 


Case 1. Suppose a point is in event A only. Then its probability is counted only once, 
in P(A), on the right-hand side of Theorem 3. 


Case 2. Suppose a point is in AN B only. Then its probability is counted in P(A), P(B) 
and in P(A q B), a net count of one on the right-hand side in Theorem 3. 
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Case 3. Suppose a point is in AM BNC. Then its probability is counted in each term 
on the right-hand side of Theorem 3, yielding a net count of 1. 


Case 4. If a point is outside A U BU C, then it is not counted on the right-hand side in 
Theorem 3. 


So Theorem 3 must be correct since it counts each point in AU BUC exactly once 
and never counts any point outside the event AU BUC. This proof uses a combinatorial 
principle, that of inclusion and exclusion, a principle used in other ways as well in the field 
of combinatorics. We will make some use of this principle in the remainder of the book. 
Theorem 2 is of course a special case of Theorem 3. 

We would like to extend Theorem 3 to n events, but this requires some combinatorial 
facts that will be developed later and so we postpone this extension until they are estab- 
lished. 


Example 1.3.2 


A card is again drawn from a well-shuffled deck. Consider the events 


A: the card shows an even number (2, 4, 6, 8, or 10), 
B: the card is a Heart, and 
C: the card is black. 
We use a ae! pe is one point for each of the 52 cards in the deck. 
Then P(A) = 2, P(B)= 5, P(C) = 2, PAN B) = 3,P(ANC) = B,P(BNC) = 
0, 
and P(AN BN C) = 0, so by Theorem 3, 


20 13,26 #5 10. 44 ~=#11 
PAUBUC sa taat —-= Se =—=—. 
( y= D2 -52. 232- 92° 52° 13 
We will show one more fact in this section. Consider S$ and an event A in S. Denot- 
ing the set of points where the event A does not occur by A, it is clear that the events A 
and A are disjoint. So, by Theorem 2, P(A U A) = P(A) + P(A) = = 1, which is most often 
written as 


Theorem 4: 7 
P(A) = 1—- P(A). 


Example 1.3.3 


Throw a pair of fair dice. What is the probability that the dice show different numbers? 
Here, it is convenient to let A be the event “the dice show different numbers.” Referring to 
the sample space shown in Figure 1.1, we compute P(A) since 


P(A) = P(the dice show the same numbers) = — = *. 
6 5 
So P(A)=1-— =. 
ME ag 6 
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This is easier than counting the 30 sample points out of 36 for which the dice show 


different numbers. 


The theorems we have developed so far appear to be fairly simple; the difficulty arises 


in applying them. 


EXERCISES 1.3 


1. 
2. 


10. 


11. 


Verify the probabilities in Example 1.3.2 by specifying the relevant sample points. 


A fair coin is tossed until a head appears. Find the probability this occurs in four or 
fewer tosses. 


. A fair coin is tossed five times. Find the probability of obtaining 


(a) exactly three heads. 
(b) at most three heads. 


. A manufacturer of pickup trucks is required to recall all the trucks manufactured in a 


given year for the repair of possible defects in the steering column and defects in the 
brake linings. Dealers have been notified that 3% of the trucks have defective steering 
only, and that 6% of the trucks have defective brake linings only. If 87% of the trucks 
have neither defect, what percentage of the trucks have both defects? 


. A hat contains tags numbered 1, 2, 3, 4, and 5. A tag is drawn from the hat and it is 


replaced, then a second tag is drawn. Assume that the points in the sample space are 

equally likely. 

(a) Show the sample space. 

(b) Find the probability that the number on the second tag exceeds the number on the 
first tag. 

(c) Find the probability that the first tag has a prime number and the second tag has 
an even number. The number | is not considered to be a prime number. 


. A fair coin is tossed four times. 


(a) Show asample space for the experiment, showing each possible sequence of tosses. 


(b) Suppose the sample points are equally likely and that a running count is made of 
the number of heads and the number of tails tossed. What is the probability the 
heads count always exceeds the tails count? 


(c) Ifthe last toss is a tail, what is the probability an even number of heads was tossed? 


. Ina sample space of two events is it possible to have P(A) = 1/2, P(A NB) = 1/3 and 


P(B) = 1/4? 
. If A and B are events in a sample space of two events, explain why P(A nN B) > P(A) - 
P(B). 


. In testing the water supply for various cities in a state for two kinds of impurities com- 


monly found in water, it was found that 20% of the water supplies had neither sort of 
impurity, 40% had an impurity of type A, and 50% had an impurity of type B. If a city 
is chosen at random, what is the probability its water supply has exactly one type of 
impurity? 

A die is loaded so that the probability a face turns up is proportional to the number on 
that face. If the die is thrown, what is the probability an even number occurs? 


Show that P(A N B) = P(B) — P(ANB). 
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12. (a) Explain why P(A U B) < P(A) + P(B). 
(b) Explain why P(A U BUC) < P(A) + P(B) + P(C). 

13. Find a formula for P(A or B) using the word “or” in an exclusive sense: that is, A or B 
means that event A occurs or event B occurs, but not both. 


14. The entering class in an engineering college has 34% who intend to major in Mechan- 
ical Engineering, 33% who indicate an interest in taking advanced courses in Mathe- 
matics as part of their major field of study, and 28% who intend to major in Electrical 
Engineering, while 23% have other interests. In addition, 59% are known to major in 
Mechanical Engineering or take advanced Mathematics while 51% intend to major in 
Electrical Engineering or take advanced Mathematics. Assuming that a student can 
major in only one field, what percent of the class intends to major in Mechanical Engi- 
neering or in Electrical Engineering, but shows no interest in advanced Mathematics? 


1.4 CONDITIONAL PROBABILITY AND INDEPENDENCE 


Example 1.4.1 


Suppose a card is drawn from a well-shuffled deck of 52 cards. What is the probability that 
the card is a Jack? If the sample space consists of a point for each card in the deck, the 
answer to the question is + since there are four Jacks in the deck. 

Now suppose the person choosing the card gives us some additional information. 
Specifically, suppose we are told that the drawn card is a face card. Now what is the 
probability that the card is a Jack? An appropriate sample space for the experiment 
becomes the set of 12 points consisting of all the possible face cards that could be selected: 


{JH, OH, KH, JD, OD, KD, JS, OS, KS, JC, OC, KC}. 


Considering each of these 12 outcomes to be equally likely, the probability the chosen 
card is a Jack is now —. The given additional information that the card is a face card has 
altered the probability of the event in question. Generally, such additional information, or 
conditions, has the effect of changing the probability of an event as the conditions change. 
Specifically, the conditions often reduce the sample space and, hence, alter the probabilities 
on those points that satisfy the conditions. 

Let us denote by 


A: the event “the chosen card is a Jack” 
and 


B: the event “the chosen card is a face card”. 


Further, we will use the notation P(A|B) to denote the probability of the event A, given 
that the event B has occurred. We call P(A|B) the conditional probability of A given B. 

In this example, we see that P(A|B) = =. 

Now we can establish a general result By reasoning as follows. Suppose the event B 
has occurred; while this reduces the sample space to those points in B, we cannot presume 
that the probability of the set of points in B is 1. However, if the probability of each point in 
B is divided by P(B), then the set of points in B has probability | and can therefore serve as 
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a sample space. This division by a constant also preserves the relative probabilities of the 
points in the original sample space; if one point in the original sample space was k times 
as probable as another, it is still k times as probable as the other point in the new sample 
space. Clearly, P(A|B) accounts for the points in A N B in the new sample space. We have 


found that 
P(ANB) 


P(A|B) = PB)” 


where we have presumed of course that P(B) # 0. 
In the earlier example, P(A N B) = " and P(B) = & so P(A|B) = _ as before. 
In this example, P(A N B) reduces to P(A), but this will not always be the case. 
We can also write this result as 


P(A NB) = P(B)- P(A|B), or, interchanging A and B, 
P(A NB) = P(A) - P(BIA). 


We call this result the multiplication theorem. 


Example 1.4.2 


A box of transistors has four good transistors mixed up with two bad transistors. A pro- 
duction worker, in order to sample the product, chooses two transistors at random, the first 
chosen transistor not being replaced before the second transistor is chosen. What is the 
probability that both transistors are good? 

If the events are 


A: the first transistor chosen is good 
and 


B: the second transistor chosen is good, 


then we want P(A N B). 

Now P(A) = - while P(B|A) = - since the box, after the first good transistor is drawn, 
contains five transistors, three of which are good transistors. So the probability that both 
chosen transistors are good is 


P(AN B) = P(A): P(BIA) 
4 3_2 


PANB)= 5-2 = 


by the multiplication theorem. 


Example 1.4.3 


In the context of the earlier example, what is the probability the second transistor chosen is 
good? 
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We need P(B). Now B can occur in two mutually exclusive ways: the first transistor 
is good and the second transistor is also good, or the first transistor is bad and the second 
transistor is good. So, 


P(B) = P[(ANB)U (AN B)] 


= P(A) - P(BJA) + P(A) - P(BIA) 


a 
Pipe 2s242<le2 
eB SG 5 8 


We used the fact in this example that 


P(B) = P(A) - P(BIA) + P(A) - P(BJA) 


since B occurs when either A or A occurs. 
This result can be generalized. Suppose the sample space consists of disjoint events so 

that 
S=A,UA,U-:-UA,, 


n 


where A; and A; have no sample points in common if i # j, i,j = 1,2, ...,7. 
Then if B is an event, 


P(B) = P[(A, NB) U(A) NB) U---U(A, NB) 
= P(A, NB) + P(A, NB) +++. + P(A, OB) 


= P(A,) - P(BJA,) + P(A2) - P(BJA2) + +++ + P(A,) - P(BIA,). 
We have then 


Theorem: (Law of Total Probability): If S= A, UA, U---UA,, where A; and A; have 
no sample points in common if i #/, i,j,= 1,2,...,7, then, if B is an event, 


P(B) = P(A,)- P(BJA,) + P(A.) - P(BJA,) +--+ + P(A,,) : P(BIA,,) or 


P(B) = D)P(A;) - P(BIA)). 


i=1 


Example 1.4.4 


A supplier purchases 10% of its parts from factory A, 20% of its parts from factory B, and 
the remainder of its parts from factory C. Out of which, 3% of A’s parts are defective; 2% 
of B’s parts are defective, and 1/2% of C’s parts are defective. What is the probability a 
randomly selected part is defective? 

Let P(A) denote the probability the part is from factory A and define P(B) and P(C) 
similarly. Let P(D) denote the probability an item is defective. Then, from the law of total 
probability, 


P(D) = P(A) - P(D|A) + P(B) - P(D|B) + P(C) - P(D|C) so 
P(D) = (0.10) - (0.03) + (0.20) - (0.02) + (0.70) - (0.005) = 0.0105. 
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So 1.05% of the items are defective. 
We will encounter other uses of the law of total probability in the following 
examples. 


Example 1.4.5 


Suppose, in the context of the previous example, we are given that the second chosen tran- 
sistor is good. What is the probability the first was also good? 
Using the events A and B in the previous example, we want to find P(A|B). 


That is 
P(ANB) 


From the previous example, P(A Nn B) = 4 : : = =, and we found in Example 1.4.3 
that P(B) = <, Ne) 
P(A|B) = >. 


When the earlier results are combined, we see that 


PANB) _ P(A) « P(BIA) 


LAD = — 
| P(B) P(A) - P(BJA) + P(A) - P(BIA) 


(12) 


This result is sometimes known as Bayes’ theorem. 
The theorem can easily be extended to three or more mutually disjoint events. 


Theorem (Bayes’ theorem): If S =A, UA, U---UA, where A; and A;, have no sample 
points in common if i 4 j then, if B is an event, 


P(A; B) 

P(A;|B) = —>— 
P(B) 
P(A;) - P(BIA;) 

P(A;|B) = 2) Yee 

P(A,) - P(BJA,) + P(A) - P(BJAz) +--+ P(A,) - P(BIA,,) 

and 

P(A,|B) = P(A;) - P(BIA;) 


vii P(A; . P(BIA)) 


Rather than remember this result, it is useful to look at Bayes’ theorem in a geometric 
way; it is not nearly as difficult as it may appear. This will first be illustrated using the 
current example. 


Draw a square of side 1; as shown in Figure 1.6, divide the horizontal axis proportional 
to P(A) and P(A)-in this case (returning to the context of Example 1.4.5) in the proportions 
4/6 to 2/6. Along the vertical axis the conditional probabilities are shown. The vertical axis 
shows P(B|A) = 3/5 and P(B|A) = 4/5, respectively. 

The shaded area above P(A) then shows P(A) - P(B|A). The total shaded area then 
shows P(B)P(B) = = - a _ a The doubly shaded region is the proportion of the 


6 i 
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P(BIA) = 4/5 
P(BIA) = 3/5 
0 1 

Figure 1.6 Diagram for Example 1.4.5. 

1 
P(BIA) a 

P(BIA) 
P(A) P(A) 
0 1 Figure 1.7 A geometric view of Bayes’ theorem. 


shaded area arising from the occurrence of A, which is P(A|B). We see that this is 


yielding the same result found using Bayes’ theorem. 
Figure 1.7 shows a geometric view of the general situation. 


Bayes’ theorem then simply involves the calculation of areas of rectangles. 


Example 1.4.6 


According to the New York Times (September 5, 1987), a test for the presence of 
the HIV virus exists that gives a positive result (indicating the virus) with cer- 
tainty if a patient actually has the virus. However, associated with this test, as with 
most tests, there is a false positive rate, that is, the test will sometimes indicate 
the presence of the virus in patients actually free of the virus. This test has a false 
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j 
P(T*IA) 


P(T*IA) 


P(A) P(A) 
0 1 Figure 1.8 AIDS example. 


positive rate of | in 20,000. So the test would appear to be very sensitive. Assuming 
now that | person in 10,000 is actually HIV positive, what proportion of patients 
for whom the test indicates HIV actually have the HIV virus? The answer may be 
surprising. 

A picture (greatly exaggerated so that the relevant areas can be seen) is shown in 
Figure 1.8. 


Define the events as 


A: patient has AIDS, 


T*: test indicates patient has AIDS. 


Then P(A) = 0.0001; P(T*|A) = 1; P(T*|A) = aT from the data given. We are 
interested in P(A|T*). So, from Figure 1.8, we see that 
P(A|T*) = (0.0001) - 1 — ot 
(0.0001) - 1 + (0.9999) - rT 
P(AIT*) = 20,000. 
29,999 


We could also of course apply Bayes’ theorem to find that 


P(ANT*) — P(ANT*) 
PT*) — p(AnT+u(AnT?)I 
P(A) - P(T*|A) 
~ P(A) P(T*|A) + P(A) - P(T+|A) 


P(A|T*) = 


_ (0.0001) - 1 

= ——____-____— 
(0.0001) - 1 + 0.9999) - 555 

_ 20,000 

~ 29,999" 


giving the same result as that found using simple geometry. 
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AIDS Example 


0.988 \ . : ; i 
0 0.2 0.4 0.6 0.8 1 
P(A) 
Figure 1.9 P(A|7*) as a function of P(A). 


At first glance, the test would appear to be very sensitive due to its small false positive 
rate, but only two-thirds of those people testing positive would actually have the virus, 
showing that widespread use of the test, while detecting many cases of HIV, would also 
falsely detect the virus in about one-third of the population who test positive. This risk may 
be unacceptably high. 

A graph of P(A|T*) (shown in Figure 1.9) shows that this probability is highly depen- 
dent on P(A). 

The graph shows that P(A|T*) increases as P(A) increases, and that P(A|TT) is very 
large even for small values of P(A). For example, if we desire P(A|7T*) to be > 0.9, then we 
must have P(A) > 0.0045. 

The sensitivity of the test may incorrectly be associated with P(T+|A). The patient, 
however, is concerned with P(A|TT). This example shows how easy it is to confuse P(A|T*) 
with P(T*|A). 

Let us generalize the HIV example to a more general medical test in this way: assume 
a test has a probability p of indicating a disease among patients actually having the disease; 
assume also that the test indicates the presence of the disease with probability 1 — p among 
patients not having the disease. Finally, suppose the incidence rate of the disease is r. 

If T* denotes that the test indicates the disease, and if A denotes the occurrence of the 
disease, then fo7 


+) 
P= Seep. 


For example, if p = 0.95 and r = 0.005 (indicating that the test is 95% accurate on both 
those who have the disease and those who do not, and that 5 patients out of 1000 actually 
have the disease), then P(A|T*) = 0.087156. Since P(A |7*) = 0.912844, a positive result 
on the test appears to indicate the absence, not the presence of the disease! 

This odd result is actually due to the small incidence rate of the disease. Figure 1.10 
shows P(A|Tt) as a function of r assuming that p = 0.95. We see that P(A|T*) becomes 
quite large (>0.8) for r > 0.21. 

It is also interesting to see how r and p, varied together, effect P(A|T*). The surface is 
shown in Figure 1.11. The surface shows that P(A|T™t) is large when the test is sensitive, 
that is, when P(T*|A) is large, or when the incidence rate r = P(A) is large. But there are 
also combinations of these values that give large values of P(A|T*) : one of these is r = 0.2 
and P(T*|A) = 0.8 for then P(A|T*) = 1/2. 
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0.8 + 


0.6 | 


P(AIT*) 


0.4 5 


0.2; 


0 02 04 06 08 1 
‘Ss 


Figure 1.10 P(A|T*) as a function of the incidence rate, r, if p = 0.95. 


0.8 


P(T+A) 06 
0.4 


P(AIT+) 0.5 


Figure 1.11 P(A|T*) as a function of r, the incidence rate, and P(T + |A). 


Example 1.4.7 


A game show contestant is shown three doors, one of which conceals a valuable prize, 
while the other two are empty. The contestant is allowed to choose one door. Regardless 
of the choice made, at least one (i.e., exactly one or perhaps both) of the remaining doors 
is empty. The show host opens one door to show it empty. The contestant is now given the 
opportunity to switch doors. Should the contestant switch? 

The problem is often called the Monty Hall problem because of its origin on the tele- 
vision show “Let’s Make A Deal.” It has been written about extensively, possibly because 
of its nonintuitive answer and perhaps because people unwittingly change the problem in 
the course of thinking about it. 

The contestant clearly has a probability of 1/3 of choosing the prize if a random choice 
of the door is made. So a probability of 2/3 rests with the remaining two doors. The fact 
that one door is opened and revealed empty does not change these probabilities; hence the 
contestant should switch and will gain the prize with probability 2/3. 
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Some think that showing the empty door gives information. Actually, it does not since 
the contestant already knows that at least one of the doors is empty. 

When the empty door is opened, the problem does not suddenly become a choice 
between two doors (one of which conceals the prize). This change in the problem ignores 
the fact that the game show host sometimes has a choice of one door to open and some- 
times two. Persons changing the problem in this manner may think, incorrectly, that the 
probability of choosing the prize is now 1/2, indicating that switching may have no effect 
in the long run; the strategy in reality has a great effect on the probability of choosing 
the prize. 

To analyze the situation, suppose that the contestant chooses the first door and the host 
opens the second door. Other possibilities are handled here by symmetry. Let A;, i = 1, 2,3 
denote the event “the prize is behind door 7” and D denote the event “door 2 is opened.” The 
condition here is then D; we now calculate the probability the prize is behind door 3, that 
is, the probability the contestant will win if he switches. We assume that P(A,) = P(A) = 
P(A3) = 1/3. 

Then P(D|A,) = 1/2, P(D|A) = 0, and P(D|A3) = 1. 

The situation is shown in Figure 1.12. 


It is clear from the shaded area in Figure 1.12 that the probability the contestant wins 
if the first choice is switched to door 3 is 


— 
WIN 


P(A3ID) = 7 


which verifies our previous analysis. 
This example illustrates that some events are highly dependent on others. We now turn 
our attention to events for which this is not so. 


1 P(DIP3) = 1 


P(DIP1) = 1/2 


P(DIP2) = 0 
0 P(A1)=1/3 P(A2)=1/3 P(A3) =1/3 1 problem. 


Figure 1.12 Diagram for the Monty Hall 
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Independence 


We have found that P(A n B) = P(A) - P(BIA). Occasionally, the occurrence of A has no 
effect on the occurrence of B so that P(B|A) = P(B). If this is the case, we call A and B as 
independent events. When A and B are independent, we have P(A N B) = P(A) - P(B). 


Definition Events A and B are called independent events if P(A N B) = P(A) - P(B). 

If we draw cards from a deck, replacing each drawn card before the next card is drawn, then 
the events denoting the cards drawn are clearly independent since the deck is full before 
each drawing and each drawing occurs under exactly the same conditions. If the cards are 
not replaced, however, then the events are not independent. 


For three events, say A, B, and C, we define the events as independent if 


P(AN B) = P(A) - P(B), 
P(AN C) = P(A): P(O), 
P(BNC) = P(B)- P(C), and 
P(AANBNC) = P(A): P(B)- P(C). (1.3) 


The first three of these conditions establishes that the events are independent in pairs, 
so we call events satisfying these three conditions as pairwise independent. Example 1.4.8 
will show that events satisfying these three conditions may not satisfy the fourth condition 
so pairwise independence does not determine independence. 

We also note that there is some confusion between independent events and mutually 
exclusive events. Often people speak of these as, “having no effect on each other,” but that 
is not a precise characterization in either case. Note that while mutually exclusive events 
cannot occur together, independent events must be able to occur together. To be specific, 
suppose that neither P(A) nor P(B) is 0, and that A and B are mutually exclusive. Then P(A N 
B) = 0 #4 P(A) - P(B). Hence, A and B cannot be independent. So if A and B are mutually 
exclusive, then they cannot be independent. This is equivalent to the statement that if A 
and B are independent, then they cannot be mutually exclusive, but the reader may enjoy 
establishing this from first principles as well. 


Example 1.4.8 


This example shows that pairwise independent events are not necessarily independent. 

A fair coin is tossed four times. Consider the events A, the first coin shows a head; B, 
the third coin shows a tail; and C, there are equal numbers of heads and tails. Are these 
events independent? 

Suppose the sample space consists of the 16 points showing the tosses of the coins in 
order. The sample space, indicating the events that occur at each point, is as follows: 


Point Event 
HHHH A 
HHHT A 
HHTH A, B 
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Point Event 
THHH 

HTHH A 
HHTT A, B,C 
HATHT A, C 
THHT Cc 
THTH B,C 
HTTH A,B,C 
TTHH Cc 
TITH B 
TTHT 

THTT B 
ATTIT A, B 
TTTT B 


Then P(A) = 1/2 and P(B) = 1/2 while C consists of the 6 points with exactly two 
heads and two tails, so P(C) = 6/16 = 3/8. 

Now P(ANB) = 7 = ; = P(A) - P(B); P(ANC) = < = P(A)- P(C); and P(BN 
C)= ~ = P(B) - P(C), so the events A, B, and C are pairwise independent. 

Now AN BNC consists of the two points HTTH and HHTT with probability 2/16 = 
1/8. 

Hence, P(AN BN C) # P(A) - P(B) - P(C), so A, B, and C are not independent. 

Formulas (1.3) also show that establishing only that P(A N BN C) # P(A) - P(B) - P(C) 
is not sufficient to establish the independence of events A, B, and C. 


EXERCISES 1.4 


1. In Example 1.4.6, verify P(A|T*) and P(A|T*). 

2. Example 1.4.8 defines the events D: the first head occurs on an even numbered toss and 
E: at least three heads occur. Are D and E independent? 

3. Box I contains 4 green and 5 brown marbles. Box II contains 6 green and 8 brown 
marbles. A marble is chosen from Box I and placed in Box II, then a marble is drawn 
from Box IL. 

(a) What is the probability the second marble chosen is green? 
(b) If the second marble chosen is green, what is the probability a brown marble was 
transferred? 

4. A football team wins its weekly game with probability 0.7. Suppose the outcomes of 
games on 3 successive weekends are independent. What is the probability the number 
of wins exceeds the number of losses? 

5. Three manufacturers of floppy disks, A, B, and C, produce 15%, 25%, and 60% of the 
floppy disks made, respectively. Manufacturer A produces 5% defective disks, man- 
ufacturer B produces 7% defective disks, and manufacturer C produces 4% defective 
disks. 

(a) What proportion of floppy disks are defective? 
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(b) Ifa floppy disk is found to be defective, what is the probability it came from man- 
ufacturer B? 


. Achest contains three drawers, each containing a coin. One coin is silver on both sides, 


one is gold on both sides, and the third is silver on one side and gold on the other side. 
A drawer is chosen at random and one face of the coin is shown to be silver. What is 
the probability that the other side is silver also? 


. If A and B are independent events in a sample space, show that 


P(A UB) = P(B) + P(A) - P(B) = P(A) + P(A) - P(B). 


. In a sample space, events A and B are independent: events B and C are mutually 


exclusive, and A and C are independent. If P(A UBU C) = 0.9 while P(B) = 0.5 and 
P(C) = 0.3, find P(A). 


. If PAU B) = 0.4 and P(A) = 0.3, find P(B) if 


(a) A and B are independent. 

(b) A and B are mutually exclusive. 

A coin, loaded so that the probability it shows heads when tossed is 3/4, is tossed twice. 

Let the events A, B, and C, be “first toss is heads,” “second toss is heads,” and “ tosses 

show the same face,” respectively. 

(a) Are the events A and B independent? 

(b) Are the events A and B U C independent? 

(c) Are the events A, B, and C independent? 

Three missiles, whose probabilities of hitting a target are 0.7, 0.8, and 0.9, respectively, 

are fired at a target. Assuming independence, what is the probability the target is hit? 

A student takes a driving test until it is passed. If the probability the test is passed on 

any attempt is 4/7 and if the attempts are independent, what is the probability the test 

is taken an even number of times? 

(a) Let p be the probability of obtaining a 5 at least once in n independent tosses of a 
die. What is the least value of n so that p > 1/2? 

(b) Generalize the result in part (a): suppose an event has probability p of occurring 
at any one of n independent trials of an experiment. What is the least value of n so 
that the probability the event occurs at least once is > r? 

(c) Graph the surface in part (b), showing n as a function of p and r. 

Box I contains 7 red and 3 black balls; Box II contains 4 red and 5 black balls. After a 

randomly selected ball is transferred from Box I to Box II, 2 balls are drawn from Box 

II without replacement. Given that the two balls are red, what is the probability a black 

ball was transferred? 

In rolling a fair die, what is the probability of rolling | before rolling an even number? 

(a) There is a fifty-fifty chance that firm A will bid for the construction of a bridge. 
Firm B submits a bid and the probability that it will get the job is 2/3, provided 
firm A does not bid; if firm A submits a bid, the probability firm B gets the job is 
1/5. Firm B is awarded the job; what is the probability firm A did not bid? 

(b) In part (a), suppose now that the probability firm B gets the job if firm A bids on 
the job is p. Graph the probability that firm A did not bid given that B gets the job 
as a function of p. 


www.it-ebooks.info 


26 Chapter 1 Sample Spaces and Probability 


17. 


18. 


19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


(c) Generalize parts (a) and (b) further and suppose that the probability that B gets the 
job given that firm A bids on the job is r. Graph the surface showing the probability 
firm A did not bid, given that firm B gets the job as a function of p and r. 

In a sample space, events A and B have probabilities P(A) = P(B) = 1/2, and P(A U 

B) = 2/3. 

(a) Are A and B mutually exclusive? 

(b) Are A and B independent? 

(c) Calculate P(A 2 B). 

(d) Calculate P(A 1 B). 

Suppose that events A, B, and C are independent with P(A) = 1/4, P(B) = 1/2, and 

P(AAU BUC) = 3/4. Find P(C). 

A fair coin is tossed until the same face occurs twice in a row, but it is tossed no more 

than four times. If the experiment is over no later than the third toss, what is the prob- 

ability that it is over by the second toss? 

A collection of 65 coins contains one with two heads; the remainder of the coins are 

fair. If a coin, selected at random from the collection, turns up heads six times in six 

tosses, what is the probability that it is the two-headed coin? 

Three distinct methods, A, B, and C, are available for teaching a certain industrial skill. 

The failure rates are 30%, 20%, and 10%, respectively. However, due to costs, A is used 

twice as frequently as B, which is used twice as frequently as C. 

(a) What is the overall failure rate in teaching the skill? 

(b) A worker is taught the skill, but fails to learn it correctly. What is the probability 
he was taught by method A? 

Sixty percent of new drivers have had driver education. During their first year of driv- 

ing, drivers without driver education have a probability 0.08 of having an accident, 

but new drivers with driver education have a 0.05 probability of having an accident. 

What is the probability a new driver with no accidents during the first year had driver 

education? 

Events A, B, and C have P(A) = 0.3, P(B) = 0.2, and P(C) = 0.4. Also A and B are 

mutually exclusive; A and C are independent and B and C are independent. Find the 

probability that exactly one of the events A, B, or C occurs. 

A set consists of the six possible arrangements of the letters a,b, and c, as well as 

the points (a, a, a), (b, b, b), and (c, c,c). Let A; be the event “letter a is in position k” 

for k = 1, 2,3. Show that the events A; are pairwise independent, but that they are not 

independent. 

Assume that the probability a first-born child is a boy is p, and that the sex of subsequent 

children follows a chance mechanism so that the probability the next child is the same 

sex as the previous child is r. 

(a) Let P,, denote the probability that the nth child is a boy. Find P;, i = 1, 2,3, in 
terms of p and r. 

(b) Are the events A; : “the ith child is a boy”, i = 1, 2,3 mutually independent? 

(c) Find a value for r so that A, and A, are independent. 

A message is coded into the binary symbols 0 and 1 and the message is sent over a 

communication channel. The probability a 0 is sent is 0.4 and the probability a 1 is 
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sent is 0.6. The channel, however, has a random error that changes a | to a 0 with 
probability 0.2 and changes a 0 to a | with probability 0.1. 


(a) What is the probability a 0 is received? 
(b) Ifa 1 is received, what is the probability a 0 was sent? 


(a) Hospital patients with a certain disease are known to recover with probability 1/2 
if they do not receive a certain drug. The probability of recovery is 3/4 if the drug 
is used. Of 100 patients, 10 are selected to receive the drug. If a patient recovers, 
what is the probability the drug was used? 

(b) In part (a), let the probability the drug is used be p. Graph the probability the drug 
was used given the patient recovers as a function of p. 


(c) Find p if the probability the drug was used given that the patient recovers is 1/2. 


Two people each toss four fair coins. What is the probability they each throw the same 
number of heads? 


In sample surveys, people may be asked questions which they regard as sensitive and 
so they may or may not answer them truthfully. An example might be, “Are you using 
illegal drugs?” If it is important to discover the real proportion of illegal drug users in 
the population, the following procedure often called a randomized response technique 
may be used. 

The respondent is asked to flip a fair coin and not reveal the result to the questioner. 
If the result is heads, then the respondent answers the question, “Is your Social Security 
number even?” If the coin comes up tails, the respondent answers the sensitive question. 
Clearly the questioner cannot tell whether a response of “yes” is a consequence of 
illegal drug use or of an even Social Security number. Explain, however, how the results 
of such a survey to a large number of respondents can be used to find accurately the 
percentage of the respondents who are users of illegal drugs. 


(a) The individual events in a series of independent events have probabilities 
72,072) 1/2)" 25/2". 


Show the probability that at least one of the events occurs approaches 0.711 as 
n> oo. 

(b) Show, if the probabilities of the events are 1/3, (1/3), (1/3), ..., (1/3)", that the 
probability at least one of the events occurs approaches 0.440 as n > oo. 

(c) Show, if the probabilities of the events are p, p, p*,...,p”, that the probability at 
least one of the events occurs can be very well approximated by the function 1 — 
p-p’ +p +p’ for 1/11 <p < 1/2. 

(a) If events A and B are independent, show that 


I. Aand B are independent. 
2. A and B are independent. 
3. A and B are independent. 


(b) Show that events A and B are independent if and only if P(A|B) = P(A|B). 


A lie detector is accurate 3/4 of the time. That is, if a person is telling the truth, the 
lie detector indicates he is telling the truth with probability 3/4 while if the person is 
lying, the lie detector indicates that he is lying with probability 3/4. Assume that a 
person taking the lie detector test is unable to influence its results and also assume that 


www.it-ebooks.info 


28 Chapter 1 Sample Spaces and Probability 


95% of the people taking the test tell the truth. What is the probability that a person is 
lying if the lie detector indicates that he is lying? 


1.5 SOME EXAMPLES 


We now show two examples of probability problems that have interesting results which 
may counter intuition. 


Example 1.5.1 (The Birthday Problem) 


This problem exists in many variations in the literature on probability and has been written 
about extensively. The basic problem is this: There are n people in a room; what is the 
probability that at least two of them have the same birthday? 

Let A denote the event “at least two people have the same birthday”; we want to 
find P(A). It is easier in this case to calculate P(A) (the probability the birthdays are all 
distinct) rather than P(A). To find P(A), note that the first person can have any day as 
a birthday. The birthday of the next person cannot match that of the first person; this 
has probability = the birthday of the third person cannot match that of either of the 


first two people; this has probability =, and so on. So, multiplying these conditional 


probabilities, 
pia) = 365. 364 363 365-(n= 1) 
365 365 365 365 


It is easy with a computer algebra system to calculate exact values for P(A) = 1 — P(A) 
for various values of n: 


n P(A) n P(A) n P(A) 

2 0.002740 18 0.346911 34 0.795317 
3 0.008204 19 0.379119 30 0.814383 
4 0.016356 20 0.411438 36 0.832182 
5 0.027136 21 0.443688 37 0.848734 
6 0.040462 22 0.475695 38 0.864068 
ok 0.056236 23 0.507297 39 0.878220 
8 0.074335 24 0.538344 40 0.891232 
9 0.094624 25 0.568700 

10 0.116948 26 0.598241 

11 0.141141 27 0.626859 

12 0.167025 28 0.654461 

13 0.194410 29 0.680969 

14 0.223103 30 0.706316 

15 0.252901 31 0.730455 

16 0.283604 32 0.753348 

17 0.315008 33 0.774972 


We see that P(A) increases rather rapidly; it exceeds 1/2 for n = 23, a fact that sur- 
prises many, most people guessing that the value of n to make P(A) > = is much larger. In 
thinking about this, note that the problem says that any two people in the room can share 
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any birthday. If some specific date comes to mind, such as August 2, then, since the proba- 
bility a particular person’s birthday is not August 2 is _ the probability that at least one 
person in a group of n people has that specific birthday 1s 


[2 (= .; 
365 / © 
It is easy to solve this for some specific probability. We find, for example, that for this 
probability to equal 1/2, n = 253 people are necessary. 


We show a graph, in Figure 1.13, of P(A) for n = 1, 2,3,...,40. The graph indicates 
that P(A) increases quite rapidly as n, the number of people, increases. 


Birthday problem 
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Figure 1.13 The birthday problem as a function of n, the number of people in the group. 


It would appear that P(A) might be approximated by a polynomial function of n. To 
consider how such functions can be constructed would be a diversion now, so we will not 
discuss it. For now, we state that the least squares approximating function found by applying 
a principle known as least squares is 


f(n) = — 6.44778 - 1073 — 4.54359 - 10> -n 
+ 1.51787 - 1073 - n? — 2.40561 - 107° - n>. 


It can be shown that f(n) fits P(A) quite well in the range 2 < n < 40. For example, 
if n = 13, P(A) = 0.194410 while f(13) = 0.196630; if n = 27, P(A) = 0.626859 while 
f(27) = 0.625357. A graph of P(A) and the approximating function f(m) is shown in 
Figure 1.14. The principle of least squares will be considered in Section 4.16. 


Example 1.5.2 


How many people must be in a group so that the probability at least two of them have 
birthdays within at most one day of each other is at least 1/2? 

Suppose there are n people in the group, and that A represents the event “at least two 
people have birthdays within at most one day of each other.” If a person’s birthday is August 
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Figure 1.14 Polynomial approximation to the birthday data. 


2, for example, then the second person’s birthday must not fall on August 1, 2, or 3, giving 
362 choices for the second person’s birthday. The third person, however, has either 359 or 
360 choices, depending on whether the second person’s birthday is August 4 or July 31 or 
some other day that has not previously been excluded from the possibilities. We give then 
an approximate solution as 


365 - 362 - 359 - - - (368 — 3n) 


P(A) = 
365 - 365 --- 365 


We seek P(A) = 1 — P(A). It is easy to make a table of values of n and P(A) with a 
computer algebra system. 


n P(A) 
2 0.008219 
3 0.024522 
4 0.048575 
5 0.079855 
6 0.117669 
7 0.161181 
8 0.209442 
9 0.261424 
10 0.316058 
ll 0.372273 
12 0.429026 
13 0.485341 
14 0.540332 
15 0.593226 


16 0.643376 


So 14 people are sufficient to make the probability that at least two of the birthdays 
differ by at most one day exceed 1/2. In the previous example, we found that a group of 
23 people was sufficient to make the probability that at least two of them shared the same 
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birthday to exceed 1/2. The probability is approximately 0.8915 that at least two of these 
people have birthdays that differ by at most one day. 


Example 1.5.3 (Mowing the Lawn) 


Jack and his daughter, Kaylyn, choose who will mow the lawn by a random pro- 
cess: Jack has one green and two red marbles in his pocket; two are selected at random. If 
the colors match, Jack mows the lawn, otherwise, Kaylyn mows the lawn. Is the game fair? 

The sample space here is most easily shown by a diagram containing the colors of 
the marbles as vertices and the edges as the two marbles chosen. Assuming that the three 
possible samples are equally likely, then two of them lead to Kaylyn mowing the lawn, 
while Jack only mows it 1/3 of the time. If we mean by the word “fair” that each mows the 
lawn with probability 1/2, then the game is clearly unfair. 


R R 


Three marbles in the lawn mowing example. 


If we are allowed to add marbles to Jack’s pocket, can the game be made fair? The 
reader might want to think about this before proceeding. 

What if a green marble is added? Then the sample space becomes all the sides and 
diagonals of a square: 


R 


D 


Q 
Q 
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Four marbles in the lawn mowing example. 

Although there are now six possible samples, four of them involve different colors 
while only two of them involve the same colors. So the probability that the colors differ 
is + = the addition of the green marble has not altered the game at all! The reader will 
easily verify that the addition of a red marble, rather than a green marble, will produce a 
fair game. 

The problem of course is that, while the number of red and green marbles is important, 
the relevant information is the number of sides and diagonals of the figure produced since 
these represent the samples chosen. If we wish to find other compositions of marbles in 
Jack’s pocket that make the game fair, we need to be able to count these sides and diagonals. 
We now show how to do this. 

Consider a figure with n vertices, as shown in Figure 1.15. 

In order to count the number of sides and diagonals, choose one of the n vertices. Now, 
to choose a side or diagonal, choose any of the other n — 1 vertices and join them. We 
have then n - (n — 1) choices. Since it does not matter which vertex is chosen first, we have 
counted each side or diagonal twice. We conclude that there are mek sides and diagonals. 


Figure 1.15 1 marbles for the lawn mow- 
ing problem. 


This is also called the number of combinations of n distinct objects chosen two at a time, 
which we denote by the symbol (5) So 


n\ _n-(n—1) 
7 - 


If the game is to be fair, and if we have r red and g green marbles, then () and (3) represent 
the number of sides and diagonals connecting two red or two green marbles, respectively. 
We want r and g so that the sum of these is ; of the total number of sides and diagonals, 
that is, we want r and g so that 


r &\ fl ) (rts 
(2) +(2)-@)-(34). 
The reader can verify that r= 6, g =3 will satisfy the above equation as will r = 10, 


g = 6. The reader may also enjoy trying to find a general pattern for r and g before reading 
problem 3 in Exercises 1.5. 
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EXERCISES 1.5 


1. 


10. 


In the birthday problem, verify the probability that, in a group of 23 people, the prob- 
ability that at least two people have birthdays differing by at most | day is 0.8915. 


. In the birthday problem, verify that the values of f(1), the polynomial approximation 


to P(A), are correct for f(13) and f(27). 


. Show that the “mowing the lawn” game is fair if and only if r and g, the number of 


red and green marbles, respectively, are consecutive triangular numbers. (The first few 
triangular numbers are 1,1 +2 =3,1+2+3=6,...) 


. A fair coin is tossed until a head appears or until six tails have been obtained. 


(a) What is the probability the experiment ends in an even number of tosses? 
(b) Answer part (a) if the coin has been loaded so as to show heads with probability p. 


. Let P,. be the probability that among r people, a t least two have the same birth month. 


Make a table of values of P.. for r = 2, 3,..., 12. Plot a graph of P,, as a function of r. 


. Two defective transistors become mixed up with two good ones. The four transistors 


are tested one at a time, without replacement, until all the defectives are identified. 
Find P,., the probability that the rth transistor tested will be the second defective, for 
r= 2,3,4. 


. Acoin is tossed four times and the sequence of heads and tails is observed. 


(a) What is the probability that heads and tails occur equally often if the coin is fair 
and the tosses are independent? 

(b) Now suppose the coin is loaded so that P(H) = 1/3 and P(T) = 2/3 and that the 
tosses are independent. What is the probability that heads and tails occur equally 
often, given that the first toss is a head? 


. The following model is sometimes used to model the spread of a contagious disease. 


Suppose a box contains b black and r red marbles. A marble is drawn and c marbles 
of that color together with the drawn marble are replaced in the box before the next 
marble is drawn, so that infected persons infect others while immunity to the disease 
may also increase. 


(a) Find the probability that the first three marbles drawn are red. 

(b) Show that the probability of drawing a black on the second draw is the same as the 
probability of drawing a black on the first draw. 

(c) Show by induction that the probability the kth marble is black is the same as the 
probability of drawing a black on the first draw. 


. A set of 25 items contains five defective items. Items are sampled at random one at a 


time. What is the probability that the third and fourth defectives occur at the fifth and 
sixth sample draws if 


(a) the items are replaced after each is drawn? 
(b) the items are not replaced after each is drawn? 


A biased coin has probability 3/8 of coming up heads. A and B toss this coin with A 
tossing first. 


(a) Show that the probability that A gets a head before B gets a tail is very close 
to 1/2. 


(b) How can the coin be loaded so as to make the probability in part (a) 1/2? 
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1.6 RELIABILITY OF SYSTEMS 


Mechanical and electrical systems are often composed of separate components which may 
or may not function independently. The space shuttle, for example, comprises hundreds of 
systems, each of which may have hundreds or thousands of components. The components 
are, of course, subject to possible failure and these failures in turn may cause individual 
systems to fail, and ultimately for the entire system to fail. We pause here to consider in 
some situations how the probability of failure of a component may influence the probability 
of failure of the system of which it is a part. 

In general, we refer to the reliability, R(t), of a component as the probability the com- 
ponent will function properly, or survive, for a given period of time. If we denote the event 
“the component lasts at least ¢ units of time” by T > ¢, then 


R@) = PT > 0), 


where f is fixed. 

The reliability of the system depends on two factors: the reliability of its component 
parts as well as the manner in which they are connected. We will consider some systems in 
this section, which have few components and elementary patterns of connection. 

We will presume that interest centers on the probability an entire system lasts a given 
period of time; we will calculate this as a function of the probabilities the components last 
for that amount of time. To do this, we repeatedly use the addition law and multiplication 
of probabilities. 


Series Systems 


If a system of two components functions only if both of the components function, then the 
components are connected in series. Such a system is shown in Figure 1.16. 
Let p, and pz denote the reliabilities of the components A and B, that is, 


Pp, = P(A survives at least ¢ units of time) and 


Pz = P(B survives at least ¢ units of time) 


for some fixed value ¢. 
If the components function independently, then the reliability of the system, say R, is 
the product of the individual reliabilities so 


R=P(A survives at least t units of time and B survives at least t units of time) 


= P(A survives at least t units of time) - P(B survives at least t units of time) so 


R=Pa" Pp: 


——>— A ————_ B —>— 


Figure 1.16 A series system of two components. 


www.it-ebooks.info 


1.6 Reliability of Systems 35 


Parallel Systems 


If a system of two components functions if either (or both) of the components function, 
then the components are connected in parallel. Such a system is shown in Figure 1.17. 

One way to calculate the reliability of the system depends on the fact that at least one 
of the components must function properly for the given period of time so 


R= P(A or B survives for a given period of time) so, 


by the addition law, 
R=p,+Ppr-Pa-Pp- 


It is also clear, if the system is to function, that not both of the components can fail so 
R=1—(1—pa)- (1 — pp). 


These two expressions for R are equivalent. 

Figure 1.18 shows the reliability of both series and parallel systems as a function of 
Da and pp. The parallel system is always more reliable than the series system since, for the 
parallel system to function, at least one of the components must function, while the series 
system functions only if both components function simultaneously. 

Series and parallel systems may be combined in fairly complex ways. We can calculate 
the reliability of the system from the formulas we have established. 


Example 1.6.1 


The reliability of the system shown in Figure 1.19 can be calculated by using the addition 
law and multiplication of probabilities. 

The connection of components A and B in the top section can be replaced by a single 
component with reliability p, - pz. The parallel connection of switches C and D can be 
replaced by a single switch with reliability 1 — (1 —p¢-)-(1 — pp). The reliability of the 
resulting parallel system is then 


t= p> pp) (Le {lf =( pe) ppt). 


Figure 1.17 A parallel system of two 
B components. 
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Figure 1.18 Reliability of series and parallel systems. 


D 
Figure 1.19 System for Example 1.6.1. 


A graph of the surface generated, assuming py =pgz and pc =pp, is shown in 
Figure 1.20. 

A contour plot of a surface shows values of p, and pc for which the reliability takes 
on particular values. Figure 1.21 shows a contour plot of the surface for Example 1.6.1, 
with contours specified at levels 0.80, 0.85, 0.90, 0.95, 0.99, and 0.995 for the reliability. 
The contour plot shows that if either p, or pc is 1, then the reliability is 1. The next contour 
shows choices of p, and pc giving reliability 0.995. The surface indicates that the system 
is highly reliable if either of the components is highly reliable and that, otherwise, the 
reliability declines rapidly. 
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Figure 1.20 Reliability surface for Example 1.6.1. 
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Figure 1.21 Contour plot for the surface in Figure 1.20. 
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EXERCISES 1.6 


1. In the diagram below, let p,,pp,, and pc be the reliabilities of the individual switches. 
Determine the reliability of the system if 


(a) at least one switch must function. 
(b) at least two switches must function. 


———— A ———__ 


> *§ 7? 


2. Determine the reliability of the system shown below if the reliability of any of the indi- 
vidual components is p. 


3. Find the reliability of the system shown below if p, = pg and pc = pp. Then show the 
surface giving the reliability of the system as a function of p, and pc and draw a contour 
plot of the surface. 


oe ee 


www.it-ebooks.info 


1.7 Counting Techniques 39 


4. Find the reliability of the system below if each component has reliability 0.92. 


1.7, COUNTING TECHNIQUES 


Occasionally, sample spaces are encountered for which the sample points are equally likely. 
If this is the case, and if the sample space S contains n points, then, since the total probability 
in the sample space is 1, each point has probability 1/n. If we denote the mutually exclusive 
points in A by a;, i= 1,2,3,...,n, then the probability of an event, A, is the sum of the 
probabilities of the sample points in A. That is, 


P(A) = Y Pa) = >- so 
ajeA ajeA 


Number of pointsin A = Number of points in A 
P(A) = a 


n ~ Number of points in S’ 


In order to consider problems leading to sample spaces with equally likely sample points, 
we pause to consider some techniques for counting sets of points. These techniques provide 
some challenging problems. 

The reader is first cautioned here to beware of concluding that just because a sample 
space has n points that each point has probability 1/n. For example, an airplane journey is 
either safely completed or not. One hopes these do not each have probability 1/2! 

The counting techniques considered here are based on two fundamental counting prin- 
ciples concerning mutually exclusive events A and B: 


Principle 1: If events A and B can occur in n and m ways, respectively, then A and B 
can occur together in n - m ways. 

Principle 2: If events A and B can occur inn and m ways, respectively, then A or B (but 
not both) can occur in n + m ways. 


Principle | is easily established since A can occur in n ways and then must be followed 
by each way in which B can occur. A tree diagram, shown in Figure 1.22, illustrates the 
result. Principle 2 simply uses the word “or” in an exclusive sense. 
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Figure 1.22 Tree diagram showing counting principle 1. 


A linear arrangement of n distinct objects is called a permutation. For example, three 
distinct objects, say A, B, and C, can be arranged in six different ways: ABC, ACB, BAC, 
BCA, CAB, and CBA. So there are six permutations of three distinct objects. To count these 
permutations for n distinct objects, we use Principle 1. We have n choices for the object 
in the first position; that object chosen, we have n — | choices for the object in the second 
position. Principle 1 tells us that there are n - (n — 1) ways to fill the first two positions. 
Continuing, we have 

n-(n—1)-(n—2)---3-2-1 


ways to arrange all n of the items. We call this expression n! and note, for example, that 
3! =3-2-1=6, verifying the number of permutations of A, B, and C above. 

The values of n! increase very rapidly: 1! = 1,2! = 2,3! = 6,4! = 24,5! = 120 and 
10! is over 3 million. If we are interested in the number of permutations of even a small set, 
we must be prepared to deal with immense quantities. For example, the cards in a deck of 
52 cards can be arranged in 52! = 80,658, 175,170,943,878,571,660,636,856,403,766,975, 
289,505,440,883,277,824,000,000,000,000 different ways. The reader may be surprised to 
find out how long it would take to enumerate these, even at a rate of 10,000 different per- 
mutations per second. This consideration may also persuade us that shuffling a deck so that 
each of these orders is equally likely is extremely unlikely. 

A fact that is useful is that 

n!=n-(n-1)! 


If we wish to permute only r, say, of the n distinct objects, this can be done in 
n:-(n—1)-(n-2)-+-(n-(r—1)) ways. 


Multiplying and dividing by (n — r)! shows that we can permute r of the n distinct objects 


in 
n! 
—— ways. 
Gan. 


So that this formula will work when r = n, we define 0! = 1. If we wish to permute 5 cards 
chosen from a deck of 52, this can be done in 


52-51 -50-49- 48 = 311, 875, 200 ways. 


www.it-ebooks.info 


1.7 Counting Techniques 41 


We note, multiplying and dividing by 47!, that this can also be written as =, a fact that 
will be useful later. 

Ifa list of permutations is desired, then the reader is advised to do this using a computer 
algebra system. The 4! = 24 permutations of the set {a,b, c,d} is shown by a computer 
algebra system to be: 


a b Cc d Cc a b d 
a b d c c a d b 
a c b d Cc b a d 
a c d b Cc b d a 
a d b c c d a b 
a d c b Cc d b a 
b a c d d a b c 
b a d c d a c b 
b c a d d b a c 
b Cc d a d b (a a 
b d a Cc d c a b 
b d c a d Cc b a 


If we regard these permutations as being equally likely and if we want to find the 
probability that a particular letter, say b, occupies its normal place, we can count the points 
for which that is true and find that there are six of them. So 


— 6 
P(b d pl =— 
(b is in second place) 7A 


What if the number of letters is large? An easy way to think about the problem is as follows: 
bis inits place, and if the set contains n distinct letters, we can arrange the remaining (n — 1) 
letters in (n — 1)! ways. Since the entire set can be permuted in n! ways, the ae that 
(n- a 

b, or any of the other particular letters, occupies its own place is 

This raises the question of other letters also occupying their own asin. If we arrange 
the letters entirely at random, what is the probability that at least one of the letters is in its 
own place? The problem has been posed in the literature in many different ways one of 
which is this: 7 men enter a restaurant and each checks his hat; the hats become mixed up 
during the evening and are passed out at the end of the evening in an entirely random way. 
What is the probability that at least one man gets his own hat? Equivalently, if we assume 
some natural order for the cards in a deck, what, after thorough shuffling, is the probability 
that at least one of the cards is in its own position? 

One way to solve the problem is to determine the number of derangements (where no 
object occupies its own place) of a set of objects. For the permutations of the set { 1, 2,3, 4}, 
we find the derangements are as follows: 


2,1,4,3 
2,3,4, 1 
2,4, 1,3 
3,1,4,2 
3,4,1,2 
3,4, 2, 1 
4,1,2,3 
4,3,1,2 
4,3,2,1 
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a total of 9 derangements in this case. It follows that the probability that at least one object 
occupies its own place is | — 2 =15 /24 = 0.625. Surely this is an awkward way to handle 
larger sets, such as the deck of 52 cards. Surprisingly, we will find that the probability that 
at least one card occupies its own place after a thorough shuffling of the deck is very close to 
the probability above for four objects! We will explain this when we return to this problem 
later in this section. 

For now, consider arranging a set of objects when the objects are not all distinct. If we 
permute the elements in the set {a, a,b,c}, we find there are 12 permutations: 


a, a,b,c 
a,a,c,b 
a,b, a,c 
a,b, c,a 
a,c,a,b 
a,c,b,a 
b, a, a,c 
b,a,c,a 
b,c, a,a 
c,a,a,b 
c,a,b,a 


c, b,a,a 


so the set of 24 permutations (where the objects were distinct) has been cut in half. We 
can arrive at this result by starting with the set {a,a,b,c}. Let R denote the number of 
distinct permutations. If we then tag the a’s with subscripts, say as a, and a», then each of 
the permutations in the above list yields 2! permutations with the subscripted a’s. Hence, 
2!-R=4!soR= ae = 12. We could do exactly the same procedure with any set. Consider, 
for example, the set {a, a, b, b, b, c,c, c, c}. By subscripting the a’s, b’s, and c’s, respectively, 
and again letting R denote the number of distinct permutations, we conclude that 


2!-3!-4!-R=9! so 


! 
pe 


ee ey eee 


The example is perfectly typical of the general situation: if the set has n, objects of one 
kind, n, of another, and so on until we have, say n, of the kth kind where a 7; =n, then 


there are 
n! 


Ny!+Ng!+++n,! 


distinct permutations of the n objects. 
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Example 1.7.1 


In how many distinct ways can 10 A’s, 5 B’s, and 2 C’s be awarded to a class of 17 students? 
Put the students in some order. Then each distinct permutation of the letters leads to a 
different assignment of the grades. So there are 


17! 
isto 


different ways to assign the grades. 

We turn now to combinations, that is, the distinguishable sets or samples of objects 
that can be chosen from a set of n distinct objects, without regard for order. We denote 
these combinations of r objects chosen from n distinct objects by ae which we read “n 
choose r.” We have already seen that (5) = TOD in Example 1.5.3. Now suppose we 
have a set of objects and we want to choose a subset or sample of size 3. To be specific, 
suppose there are four items: a, b,c, and d. It is easy to write down the four combinations 
of size 3: a, b, c; a, b, d; a, c, d; and b, c, d. However, if we were dealing with larger 
set, it might be very difficult to write down a complete list without a procedure in mind. As 
a suggestion, to create the samples of size 3, we could choose each of the samples of size 
2 and then attach a third item. The resulting list is as follows: 


a, b,c 
a, b, d 
b,c, a 
b,c, d 
a, b, d 
b,c, a 
a, b, d 


b, c, a. 


Since we have 2 choices for the third item, the resulting list contains 2 - (5) items. But 
each of the combinations has occurred three times. Therefore, 


(plat 


This would appear to be a difficult way to arrive at (3) . The reasoning here, however, 
can easily be extended and therein lies its advantage. Suppose we have a set of n distinct 
items and we wish to choose a sample of size r. If we choose all the possible samples of 
size r — | and then attach one of the n — r+ | remaining items to each, the resulting list 
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has (n— r+ 1)- (,”,) items. But this counts each of the (7) combinations r times. So, 


w-ren-(,7,) -r-(") 
n 
(n—r+1)- ew 
or | 


= (1.4) 


r 


This is a recurrence formula since it expresses some values of a function, here (") , in 
terms of other values of the same function. If we have a starting place, we can calculate any 
value of the function we want. Here, since (7) =n, formula (1.4) shows that 


n\ _n-2+1 (n\_n-(n-1)_ n! 
7 ny) Lp 2 ~ Daete= Ti 


verifying our previous result. We continue to apply formula (1.4) to find 


(5) - 2G) =  -eene 
~ 3 2) a22 7 3!-(n— 3)! 


n\ _ n! 
3) 3!-(n—3)! 


It can be concluded by an inductive proof that 


n n!} 
= —~,. FS 0) ds 
(") r!-(n—-r)! 7 7 


using the recurrence formula above. 
If we have a set of n distinct objects and r are chosen, then n — r objects must remain 
unchosen. Since each time the chosen set is altered, so is the unchosen set, it follows that 


The quantities ) are often called binomial coefficients since they occur in the binomial 
expansion: 


Binomial Theorem: 


(a+b = s (") gh BT = y (") sal BP, (1.5) 


r=0 


For example, 


(a+by = (6) a+ (;) a’b+ (3) ab? + (3) ab? + (5) ab* + @ be 


=a + 5a‘b + 10a°b* + 10a*b? + S5ab* + b°. 
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Many interesting identities are known concerning the binomial coefficients. If a = 1 
and b = | are substituted in formula (1.5), the result is 


vomna2=¥()=()(1)*QeoeG) 


Each side of this result may be recognized as the number of possible subsets (including 
the null set) that can be chosen from a set of n distinct items. 
If we differentiate (1.5) with respect to a and then let a = | and b = 1, the result is 


nat = Di (")-r=0-(5) 41-(7) 42-(5) + -4n- (2), 


We show one more fact concerning the binomial coefficients. Suppose we want to 
choose a committee of size r chosen from a group of n people, one of whom is Sam. Sam 
is a member of () committees and he is not a member of ) committees, so, since we 
have exhausted the possibilities, 


= ++ : 
r r-1 r 
This is also often known as Pascal’s identity since it occurs in Pascal’s triangle of 
binomial coefficients. 
It is also necessary for us, although this may seem unnatural to the reader, to ascribe 
some meaning to a symbol such as (2) Clearly, we cannot interpret this as the choice of 


3 objects from —7 objects! The following definition, while including our previous interpre- 
tation of Cy allows us to extend its meaning as well. 


n-(n—1)-(n—2)---(n—r+1) 
r! 


Definition: (*) = provided that r is a nonnegative integer. 


Using the above definition, we have that (3) — aa: = —84. We will need 
facts such as this in subsequent chapters. 

Using this definition, the binomial theorem can also be used with negative exponents. 
For example, 


(a+b)> =a> + (*) a b+ iS) a'r + & ab? 42+ oF 
-5 _ 5 3) 6 6 72 _ TY. 859 — 
(a+b)- =a (; a b+ 2)4 b 3) 4 b+ 


We now use some of the results found here in some examples. 


Example 1.7.2 


A box of manufactured items contains 8 items that are good and 3 that are not usable. What 
is the probability that a sample of 5 items contains exactly | unusable item? 
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Suppose that each of the samples has probability ——_. There are (3) ways to choose 


uy 


the 4 good items and (7) ways to choose the unusable item. The multiplication principle 


then gives (7) : (3) ways to choose exactly | unusable item. So the probability we seek is 


Finally, in this chapter, we consider the general addition law for n events, having estab- 
lished the addition law for two and for three events. So we seek to prove 


(-1)""!P(A,; NA,N---NA,), where the sums are over all the distinct items in the 
summand, that is, wherei>j>k>--- 


Proof We again use the principle of inclusion and exclusion. Consider a point in A, U 
A, U--+-UA, which is in exactly k of the events A;. It will be convenient to renumber the 
A,’s if necessary so that the point is in the first k of these events. We will now show that 
the right-hand side of Theorem 4 counts this point exactly once, showing the theorem to be 
correct. 

The point is counted on the right-hand side of Theorem 4 


(2) (2) = (8)---a(£) ss 


But the binomial expansion of 0 = [1 + (—l)| =e ge ‘) - (—1)! shows that 


()-Q)()--=0)-Q= 


establishing the result. 


Example 1.7.3 


We return to the matching problem stated earlier in this section: If n integers are randomly 
arranged in a row, what is the probability that at least one of them occupies its own place? 
The general addition law can be used to provide the solution. 
Let A; denote the event, “number / is in the ith place.” We seek P(A, UA, U---UA,,). 
Here P(A;) = — a , since, after 7 is put in its own place, there are (n — py! ways to 


arrange the remaining ee: P(A; NA;) = ve 2) , since if i andj occupy their own places 
we can permute the remaining n — 2 phic in Gi — 2)! ways; and, in general, P(A; NA, 


k)! 
. NA) = — . Now we note that there are (j 1) choices for an individual number /; there 
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are G ) choices for pairs of numbers 7 and j; and, in general, there are Ci ) choices for k of 
the numbers. So, applying Theorem 4, 


P(A, UA, U+++UA,) = (1) a — _ @ (a =D 


n\ (n—3)! n\ (n—n)! 
“aa nyo 


This simplifies to P(A, UA) U---UA,) = = ~5 ns 1 sae ~ 
A table of values of this expression is shown below. , 


n P 

1 1.000000 
2 0.500000 
3 0.666667 
4 0.625000 
5 0.633333 
6 0.631944 
7 0.632143 
8 0.632118 
9 0.632121 


To six decimal places, the probability that at least one number is in its natural posi- 
tion remains at 0.632121 for n > 9. An explanation for this comes from a series expansion 
for e*: 


x a res 
Shree oe hag t gee 
So 
Gly ey Cy 
Pi-1-1+4 TT + 1 a2 7 +++ or 
Ps ee re 
. 2 31 CY 
So we see that + — + eee approaches 1 — + = 0.632120559 .. . This is our 


first, but certainly not out last, encounter with e ina probability problem. This alee explains 
why we remarked that the probability at least one card in a shuffled deck of 52 cards was 
in its natural position differed little from that for a deck consisting of only 9 cards. 

We turn now to some examples using the results established in this section. 


Example 1.7.4 


Five red and four blue marbles are arranged in a row. What is the probability that both the 
end marbles are blue? 

A basic decision in the solution of the problem concerns the type of sample space to be 
used. Clearly, the problem involves order, but should we consider the marbles to be distinct 
or not? 
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Initially, consider the marbles to be alike, except for color of course. There are ~. = 
126 possible orderings of the marbles and we consider each of these to be equally likely. 
Since the blue marbles are indistinct from each other, and since our only choice here is the 
arrangement of the 7 marbles in the middle, it follows that there are — = 21 arrangements 
with blue marbles at the ends. The probability we seek is then 41 =}. 

Now if we consider each of the marbles to be distinct, there are 9! possible arrange- 
ments. Of these, we have (3) - 2! = 12 ways to arrange the blue marbles at the ends and 7! 
ways to arrange the marbles in the middle. This produces a probability of re = =. 

The two methods must produce the same result, but the reader may find one method 
easier to use than another. In any event, it is crucial that the sample space be established 
as a first step in the solution of the problem and that the events of interest be dealt with 
consistently for this sample space. 

The reader may enjoy showing that, if we have n marbles, r of which are red and b 


of which are blue, then the probability both ends are blue in a random arrangement of the 
marbles is given by the product 
(.-2)-(1- 4) 
n n-1 


This answer may indicate yet another way to solve the problem, namely this: the probability 


n—- 


the first marble is blue is ( = ). Given that the first end is blue, the conditional probabil- 


n—-r-1 


ity the other end is also blue is (<> ). Often probability problems involving counting 


techniques can be solved in a variety of ways. 


Example 1.7.5 


Ten race cars, numbered from | to 10, are running around a track. An observer sees three 
cars go by. If the cars appear in random order, what is the probability that the largest number 
seen is 6? 

The choice of the sample space here is natural: consider all the (2) samples of three 
cars that could be observed. If the largest is to be 6, then 6 must be in the sample, together 
with two cars chosen from the first 5, so the probability of the event “Maximum = 6” is 


P(Maximum = 6) = ——~———— = —.. 
10 12 
(5) 
It is also interesting now to look at the median or the number in the middle when the three 
observed numbers are arranged in order. What is the probability that the median of the 
group of three is 6? 


For the median to be 6, 6 must be chosen and we must choose exactly one number from 
the set {1, 2, 3, 4, 5} and exactly one number from {7, 8, 9, 10}. Then 


())-)_ 


P( Median = 6) = ————-———— = 


G) * 
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1 k-1 10-k 
J\ a f° 1 
10 
3 
_ (k= 10 =k) 
~ 120 


This can be generalized to 
P(Median = k) = 


ph 263, a 9: 


Figure 1.23 shows a graph of P(Median = k) for k = 2, 3, ... ,9. It reveals a symmetry in 
the function around k = 5.5. 

The problem is easily generalized with a result that may be surprising. Suppose there 
are 100 cars and we observe a sample of 9 of them. The median of the sample must be at 
least 5 and can be at most 96. The probability the median is k then is 


Olenlen) 
(9) 


P(Median = k) = 


© 
De) 


Probability 
[o) [o) [o) [o) 
Oo i ee 
4“ © BR BD © 


2 
ro) 
© 


0.06 L: . 
2 3 4 5 6 7 8 9 


Median 
Figure 1.23 P( Median = &) for a sample of size 3 chosen from 10 cars. 


A graph of this function (an eighth degree polynomial in k) is shown in Figure 1.24. 

The graph here shows a “bell shape” that, as we will see, is very common in probability 
problems. The curve is very close to what we will call a normal curve. Larger values for 
the number of cars involved will, surprisingly, not change the approximate normal shape 
of the curve! An approximation for the actual curve involved here can be found when we 
study the normal curve thoroughly in Chapter 3. 


Example 1.7.6 


We can use the result of Example 1.7.3 to count the number of derangements of a set of n 
objects. That is, we want to count the number of permutations in which no object occupies 
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0.025 | roe 
0.02 | 


0.015 + 


Probability 


0.01 + 


Median 


Figure 1.24 P( Median = &) for a sample of 9 chosen from 100 cars. 


its own place. Example 1.7.3 shows that the number of permutations of n distinct objects 
in which at least one object occupies its own place is 


It follows that the number of derangements of n distinct objects is 


nianl(t +2 to) 


i 4 1 
i! 21 3! a =nl(S- a tots): An) 


Using this formula we find that if n = 2, there is | derangement; if n = 3, there are 2 
derangements, and if n = 4, there are 9 derangements. 

Formula (1.6) also suggests that the number of derangements of n distinct objects is 
approximately n! - e~! (see the series expansion for e~! in Example 1.7.5). The following 
table compares the results of formula (1.6) and the approximation: 


n Number of derangements nie! 
2 1 0.7358 
3 2 2.207 
4 9 8.829 
5 44 44.146 
6 265 264.83 
7 1854 1854.11 


We see that in every case, the number of derangements is given by |n! - e~! + 0.5] 
where the symbols indicate the greatest integer function. 


Example 1.7.7 (Ken—Ken Puzzles) 


The New York Times as well as many other newspapers publish a Ken—Ken puzzle daily. 
The problem consists of a square with 4 rows and 4 columns. The problem is to insert 
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each of the digits 1, 2, 3, and 4 into each row and each column so that each digit appears 
exactly once in each row and each column. The reader is given arithmetic clues for some 
squares. For example, 5+ may indicate the proper entries are 4 | (but there are many other 
possibilities. Here is an example of a solved puzzle (without the clues): 


ReN WwW 
NWR f 
e BWP 
Wn Be 


Clearly, each row (and hence each column) is a derangement of the integers | through 
4, but each row (and column) must be a derangement of each of the previous rows (or 
columns). How many Ken—Ken puzzles are there? 

Since we will be permuting the rows and columns later, we might just as well start 
with the row 1 2 3. 4. For the second row, we must select one of the 9 derangements 
of the integers 1, 2, 3, 4, as shown in Example 1.7.1. We will choose2 4 1 3,so now 
we have : ; : : By examining the 9 derangements again, we find only two choices 
for the thirdrow:3 1 4 2or4 3 2 _ 1. When one of these is chosen, there is only 
one choice for the fourth row—the derangement that was not selected for the third row. 
Selecting the first choice for the third row, we have 


BRWNe 
We BP 
NO BRR Ww 
Se NW Bs 


Now the rows and the columns may be permuted in 4! * 4! ways, so the total number 
of Ken—Ken puzzles with 4 rows and 4 columns is 9 « 2 * 4! * 4!= 10368. 


EXERCISES 1.7 


1. The integers 1, 2, 3,..., 9 are arranged in a row, resulting in a nine-digit integer. What 
is the probability that 


(a) the integer resulting is even? 
(b) the integer resulting is divisible by 5? 
(c) the digits 6 and 4 are next to each other? 


2. License plates in Indiana consist of a number from 1 to 99 (indicating the county of 
registration), a letter of the alphabet, and finally an integer from 1 to 9999. How many 
cars may be licensed in Indiana? 


3. Prove that at least two people in Colordao Springs, Colorado, have the same three 
initials. 
4. Inasmall school, 5 centers, 8 guards, and 6 forwards try out for the basketball team. 


(a) How many five-member teams can be formed from these players? (Assume a team 
has two guards, two forwards, and one center.) 


(b) Intercollegiate regulations require that no more than 8 players can be listed for the 
team roster. How many rosters can be formed consisting of exactly 8 players? 
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5s 


10. 
11. 


12. 


13. 


14. 


15. 


A restaurant offers 5 appetizers, 7 main courses, and 8 desserts. How many meals can 
be ordered 


(a) assuming all three courses are ordered? 
(b) not assuming all three courses are necessarily ordered? 


. A club of 56 people has 40 men and 16 women. What is the probability the board of 


directors, consisting of 8 members, contains no women? 


. In a controlled experiment, 12 patients are to be randomly assigned to each of three 


different drug regimens. In how many ways can this be done if each drug is to be tested 
on 4 patients? 


. In the game Keno, the casino draws 20 balls from a set of balls numbered from | to 80. 


A player must choose 10 numbers in advance of this drawing. What is the probability 
the player has exactly five of the 20 numbers drawn? 


. A lot of 10 refrigerators contains 3 which are defective. The refrigerators are randomly 


chosen and shipped to customers. What is the probability that by the seventh shipment, 
none of the defective refrigerators remain? 


In how many different ways can the letters in the word “repetition” be arranged? 


In a famous correspondence in the very early history of probability, the Chevalier 

de Mére wrote to the mathematician Blaise Pascal and asked the following question, 

“Which is more likely—at least one six in four rolls of a fair die or at least one sum of 

12 in 24 rolls of a pair of dice?” 

(a) Show that the two questions above have nearly equal answers. Which is more 
likely? 

(b) A generalization of the Pascal—de Mére problem is: what is the probability that the 
sum 6n occurs at least once in 4 - 6"~! rolls of n fair dice? Show that the answer is 
very nearly 1/2 forn <5. 

(c) Show that in part (b) the probability approaches 1 — e~?/3 as n > 0. 

A box contains 8 red and 5 yellow marbles from which a sample of 3 is drawn. 


(a) Find the probability that the sample contains no yellow marbles if 


(1) the sampling is done without replacement; and, 
(2) if the sampling is done with replacement. 


(b) Now suppose the box contains 24 red and 15 yellow marbles (so that the ratio of 
reds to yellows is the same as in part (a)). Calculate the answers to part (a). What 
do you expect to happen as the number of marbles in the box increases but the ratio 
of reds to yellows remains the same? 

(a) From a group of 20 people, two samples of size 3 are chosen, the first sample being 
replaced before the second sample is chosen. What is the probability the samples 
have at least one person in common? 

(b) Show that two bridge hands, the first being replaced before the second is drawn, 
are virtually certain to contain at least one card in common. 

A shipment of 20 components will be accepted by a buyer if a random sample of 3 (cho- 

sen without replacement) contains no defectives. What is the probability the shipment 

will be rejected if actually 2 of the components are defective? 

A deck of cards is shuffled and the cards turned up one at a time. What is the probability 

that all the aces will appear before any of the 10’s? 
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In how many distinguishable ways can 6 A’s, 4 B’s, and 8 C’s be assigned as grades to 
18 students? 

What is the probability a poker hand (5 cards drawn from a deck of 52 cards) has exactly 
2 aces? 

In how many ways can 6 students be seated in 10 chairs? 


Ten children are to be grouped into two clubs, say the Lions and the Tigers, with five 
children in each club. Each club is then to elect a president and a secretary. In how 
many ways can this be done? 


A small pond contains 50 fish, 10 of which have been tagged. If a catch of 7 fish is 
made, in how many ways can the catch contain exactly 2 tagged fish? 

From a fleet of 12 limousines, 6 are to go to hotel I, 4 to hotel II, and the remainder to 
hotel II. In how many different ways can this be done? 

The grid shows a region of city blocks defined by 7 streets running North-South and 
8 streets running East-West. Joe will walk from corner A to corner B. At each corner 
between A and B, Joe will choose to walk either North or East. 


B 


A 


(a) How many possible routes are there? 

(b) Assuming that each route is equally likely, find the probability that Joe will pass 
through intersection C. 

Suppose that N people are arranged in a line. What is the probability that two particular 

people, say A and B, are not next to each other? 

The Hawaiian language has only 12 letters: the vowels a, e, i, o, and u and the conso- 

nants h, k, 1, m, n, p, and w. 

(a) How many possible three-letter Hawaiian “words” are there? (Some of these may 
be nonsense words.) 

(b) How many three-letter “words” have no repeated letter? 


(c) What is the probability a randomly selected three-letter “word” begins with a con- 
sonant and ends with 2 different vowels? 
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(d) What is the probability that a randomly selected three-letter “word” contains all 
vowels? 


25. How many partial derivatives of order 4 are there for a function of 4 variables? 


26. A set of 15 marbles contains 4 red and 11 green marbles. They are selected, one at a 
time, without replacement. In how many ways can the last red marble be drawn on the 
seventh selection? 


27. A true—false test has four questions. A student is not prepared for the test and so must 
guess the answer to each question. 


(a) What is the probability the student answers at least half of the questions correctly? 


(b) Now suppose, in a sudden flash of insight, he knows the answer to question 2 is 
“false.” What is the probability he answers at least half of the questions correctly? 


28. What is the probability of being dealt a bridge hand (13 cards selected from a deck of 
52 cards) that does not contain a heart? 


29. Explain why the number of derangements of n distinct objects is given by [n! - e7! + 
0.5]. Explain why n! - e~! sometimes underestimates the number of derangements and 
sometimes overestimates the number of derangements. |x| denotes the greatest integer 
in x. 


30. Find the number of Ken—Ken puzzles if the grid is 5 x 5 for the integers 1, 2, 3, 4, 5. 


CHAPTER REVIEW 


In dealing with an experiment or situation involving random or chance elements, it is rea- 
sonable to begin an analysis of the situation by asking the question, “What can happen?” An 
enumeration of all the possibilities is called a sample space. Generally, situations admit of 
more than one sample space; the appropriate one chosen is usually governed by the prob- 
abilities that one wants to compute. Several examples of sample spaces are given in this 
chapter, each of them discrete, that is, either the sample space has a finite number of points 
or a countably infinite number of points. 

Tossing two dice yields a sample space with a finite number of points; observing births 
until a girl is born gives a sample space with an infinite (but countable) number of points. 
In the next chapter, we will encounter continuous sample spaces that are characterized by 
a noncountably infinite number of points. 

Assessing the long-range relative frequency, or probability, of any of the points or sets 
of points (which we refer to as events) is the primary goal of this chapter. We use the set 
symbols U for the union of two events and / for the intersection of two events. We begin 
with three assumptions or axioms concerning sample spaces: 


(1) P(A) > 0, where A is an event; 
(2) P(S) = 1, where S is the entire sample space; and, 


(3) P(AorB) = P(A UB) = P(A) + P(B) if A and B are disjoint, or mutually exclusive, 
they have no sample points in common. 


From these assumptions, we derived several theorems concerning probability, among 
them: 


(1) P(A) = ee 4P(a;), where the a; are distinct point in S. 
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(2) P(A U B) = P(A) + P(B) — P(A B) (the addition law for two events). 
(3) P(A) = 1- P(A) 
We showed the Law of Total Probability. 


Theorem (Law of Total Probability): If the sample space S = A, UA, U ... UA, where 
A; and A; have no sample points in common if i # j, then, if B is an event, 


P(B) = P(A,) - P(BIA,) + P(A2) - P(BJAz) + ... + P(A,,) - P(BIA,). 


We then turned our attention to problems of conditional probability where we sought 
the probability of some event, say A, on the condition that some other event, say B, has 
occurred. We showed that 


PANB) _ P(A) « P(BIA) 


P(A B) = SS ee 
| P(B) P(A) - P(BJA) + P(A) - P(BIJA) 


This can be generalized using the Law of Total Probability as follows: 


Theorem (Bayes’ Theorem): If S=A,UA)U ... UA, where A; and A; have no sample 
points in common for i ¥ j, then, if B is an event, 


P(A; B) 
P(A;|B) = PB) 
ee P(A;) « P(A;|B) 
(Ai1B) = 5) PIA) + PUA): PBIAy) +--+ PUA, PBI, 
P(A;) « P(A;|B) 
P(A,|B) = 


Di=1P(A)) - P(A;|B) 


Bayes’ theorem has a simple geometric interpretation. The chapter contains many 
examples of this. 
We defined the independence of two events, A and B as follows: 


A and B are independent if P(A N B) = P(A) - P(B). 


We then applied the results of this chapter to some specific probability problems, such 
as the well-known birthday problem and a geometric problem involving the sides and diag- 
onals of a polygonal figure. 

Finally, we considered some very special counting techniques which are useful, it is to 
be emphasized, only if the points in the sample space are equally likely. If that is so, then 
the probability of an event, say A, is 


Number of points in A 
PA) eet 
Number of points in S$ 
If order is important, then all the permutations of objects may well comprise the sam- 
ple space. We showed that there are n! = n-(n— 1)-(n —2)---3-+2- 1 permutations of n 
distinct objects. 
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If order is not important, then the sample space may well comprise various combina- 


tions of items. We showed that there are 


n\ _ n! 
r}) r\n—n)! 


samples of size r that can be selected from n distinct objects and applied this formula to 
several examples. A large number of identities are known concerning these combinations, 
or binomial coefficients, among them: 


‘apa (") = 
o(0)=(121)9( 


One very important result from this section is the general addition law: 


Theorem: 


P(A, UA, U+++UA,) = dP) “2 VPA; nA,) 


+ SPA; NA; Ay) — ++ + (=D) P(AY NAD N+ A,), 


where the summations are overi > j >k>--- 


PROBLEMS FOR REVIEW 


Exercises 1.1 #1, 2,5, 7,9, 11 

Exercises 1.3 # 1, 2, 6, 7,9, 13 

Exercises 1.4 # 1, 2, 3, 6, 10, 15, 16, 18, 19, 21, 24 
Exercises 1.5 #2, 3, 6, 7 

Exercises 1.6 # 1,3 

Exercises 1.7 # 1, 6, 8, 10, 12, 13, 16, 17, 20, 23, 28 


SUPPLEMENTARY EXERCISES FOR CHAPTER 1 


1. 


A hat contains slips of paper on which each of the integers 1, 2, ... , 20 is written. A 
sample of size 6 is drawn (without replacement) and the sample values, x,, put in order 
so that x, <x, < +++ < x. Find the probability that x, = 12. 


DP 


. Show that (n — k) (ak) =(k+1) (tt): 
. Suppose that events A,B, and C are independent with P(A) = 1/3, P(B) = 1/4, and 


P(AAUBUC) = 3/4. Find P(C). 


. Events A and B are such that P(A U B) = 0.8 and P(A) = 0.2. For what value of P(B) are 


(a) A and B independent? 
(b) A and B mutually exclusive? 


. Events A, B, and C ina sample space have P(A) = 0.2, P(B) = 0.4, and PAUBUC) = 


0.9. Find P(C) if A and B are mutually exclusive, A and C are independent, and B and 
C are independent. 


www.it-ebooks.info 


10. 


11. 


12. 


13. 


14. 


15. 


16. 
17. 


18. 


1.7 Counting Techniques 57 


. How many distinguishable arrangements of the letters in PROBABILITY are there? 


. How many people must be in a group so that the probability that at least two were born 


on the same day of the week is at least 1/2? 


. A and B are special dice. The faces on die A are 2, 2,5, 5, 5, 5 and the faces on die 


B are 3, 3, 3, 6, 6, 6. The two dice are rolled. What is the probability that the number 
showing on die B is greater than the number showing on die A? 


. Acommittee of 5 is chosen from a group of 8 men and 4 women. What is the probability 


the group contains a majority of women? 


A college senior finds he needs one more course for graduation and finds only courses in 

Mathematics, Chemistry, and Computer Science available. On the basis of interest, he 

assigns probabilities of 0.1, 0.6 and 0.3, respectively, to the events of choosing each of 

these. After considering his past performance, his advisor estimates his probabilities of 

passing these courses as 0.8, 0.7, and 0.6, respectively, regarding the passing of courses 

as independent events. 

(a) What is the probability he passes the course if he chooses a course at random? 

(b) Later we find that the student graduated. What is the probability he took Chem- 
istry? 

A number, X, is chosen at random from the set {10, 11, 12,..., 99}. 

(a) Find the probability that the 10’s digit in X is less than the units digit. 

(b) Find the probability that X is at least 50. 

(c) Find the probability that the 10’s digit in X is the square of the units digit. 

If the integers 1, 2, 3, and 4 are randomly permuted, what is the probability that 4 is to 

the left of 2? 

In a sample space, events A and B are such that P(A) = P(B), P(A fa) B) = P(AANB)= 

1/6. Find 

(a) P(A). | 

(b) P(A UB). 

(c) P(Exactly one of the events A or B). 

A fair coin is tossed four times. Let A be the event “‘2nd toss is heads,” B be the event 

“Exactly 3 heads,” and C be the event “4th toss is tails if the 2nd toss is heads.” Are A, 

B, and C independent? 


An instructor has decided to grade each of his students A, B, or C. He wants the prob- 
ability a student receives a grade of B or better to be 0.7 and the probability a student 
receives at most a grade of B to be 0.8. Is this possible? If so, what proportions of each 
letter grade must be assigned? 

How many bridge hands are there containing 3 hearts, 4 clubs, and 6 spades? 

A day’s production of 100 fuses is inspected by a quality control inspector who tests 10 
fuses at random, sampling without replacement. If he finds 2 or fewer defective fuses, 
he accepts the entire lot of 100 fuses. What is the probability the lot is accepted if it 
actually contains 20 defective fuses? 

Suppose that A and B are events for which P(A) = a, P(B) = b and P(ANB)=c. 
Express each of the following in terms of a, b, and c. 

(a) P(AUB) 

(b) P(ANB) 
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19. 


20. 


21. 


22. 


23. 
24. 


25. 


26. 
27. 


28. 


29. 
30. 


(c) P(AUB) 
(d) P(ANB) 
(e) P(exactly one of A or B occurs). 


An elevator starts with 10 people on the first floor of an eight-story building and stops 
at each floor. 


(a) In how many ways can all the people get off the elevator? 
(b) How many ways are there for everyone to get off if no one gets off on some two 
specific floors? 


(c) In how many ways are there for everyone to get off if at least one person gets off 
at each floor? 


A manufacturer of calculators buys integrated circuits from suppliers A, B, and C. Fifty 
percent of the circuits come from A, 30% from B, and 20% from C. One percent of the 
circuits supplied by A have been defective in the past, 3% of B’s have been defective, 
and 4% of C’s have been defective. A circuit is selected at random and found to be 
defective. What is the probability it was manufactured by B? 


Suppose that E and T are independent events with P(E) = P(T) and P(E UT) = 1/2. 
What is P(E)? 

A quality control inspector draws parts one at a time and without replacement from a 
set containing 5 defective and 10 good parts. What is the probability the third defective 
is found on the eighth drawing? 

If A, B, and C are independent events, show that the events A and B U C are independent. 
Bean seeds from supplier A have an 85% germination rate and those from supplier B 
have a 75% germination rate. A seed company purchases 40% of their bean seeds from 
supplier A and the remaining 60% from supplier B and mixes these together. If a seed 
germinates, what is the probability it came from supplier A? 

An experiment consists of choosing two numbers without replacement from the set 
{1, 2, 3, 4, 5, 6} with the restriction that the second number chosen must be greater 
than the first. 

(a) Describe the sample space. 

(b) What is the probability the second number is even? 

(c) What is the probability the sum of the two numbers is at least 5? 

What is the probability a poker hand contains exactly one pair? 

A box contains 6 good and 8 defective light bulbs. The bulbs are drawn out one at a 
time, without replacement, and tested. What is the probability that the fifth good item 
is found on the ninth test? 

An individual tried by a three-judge panel is declared guilty if at least two judges cast 
votes of guilty. Suppose that when the defendant is, in fact, guilty, each judge will 
independently vote guilty with probability 0.7 but, if the defendant is, in fact, innocent, 
each judge will independently vote guilty with probability 0.2. Assume that 70% of the 
defendants are actually guilty. If a defendant is judged guilty by the panel of judges, 
what is the probability he is actually innocent? 

What is the probability a bridge hand is missing cards in at least one suit? 


Suppose 0.1% of the population is infected with a certain disease. On a medical test for 
the disease, 98% of those infected give a positive result while 1% of those not infected 
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give a positive result. If a randomly chosen person is tested and gives a positive result, 
what is the probability the person has the disease? 


A committee of 50 politicians is to be chosen from the 100 US Senators (2 are from 
each state). If the selection is done at random, what is the probability that each state 
will be represented? 


In a roll of a pair of dice (one red and one green), let A be the event “red die shows 3, 
4, or 5,” B the event “green die shows a | or a 2,” and C the event “dice total 7.” Show 
that A, B, and C are independent. 


An oil wildcatter thinks there is a 50-50 chance that oil is on the property he purchased. 
He has a test for oil that is 80% reliable: that is, if there is oil, it indicates this with 
probability 0.80 and if there is no oil, it indicates that with probability 0.80. The test 
indicates oil on the property. What is the probability there really is oil on the property? 
Given: A and B are events with P(A) = 0.3, P(B) = 0.7 and P(A U B) = 0.9. Find 

(a) PAN B) 

(b) P(BIA). 

Two good transistors become mixed up with three defective transistors. A person is 
assigned to sampling the mixture by drawing out three items without replacement. 
However, the instructions are not followed and the first item is replaced, but the second 
and third items are not replaced. 

(a) What is the probability the sample contains exactly two items that test as good? 
(b) What is the probability the two items finally drawn are both good transistors? 
How many lines are determined by 8 points, no three of which are collinear? 

Show that if A and B are independent, then A and B are independent. 

How many tosses of a fair coin are needed so that the probability of at least one head 
is at least 0.99? 

A lot of 24 tubes contains 13 defective ones. The lot is randomly divided into two equal 
groups, and each group is placed in a box. 

(a) What is the probability that one box contains only defective tubes? 


(b) Suppose the tubes were divided so that one box contains only defective tubes. A 
box is chosen at random and one tube is chosen from the chosen box and is found 
to be defective. What is the probability a second tube chosen from the same box is 
also defective? 


A machine is composed of two components, A and B, which function (or fail) indepen- 
dently. The machine works only if both components work. It is known that component 
A is 98% reliable and the machine is 95% reliable. How reliable is component B? 


Suppose A and B are events. Explain why P(exactly one of events A, B occurs) = 
P(A) + P(B) — 2P(A Nn B). 

A box contains 8 red, 3 white, and 9 blue balls. Three balls are to be drawn, without 
replacement. What is the probability that more blues than whites are drawn? 


A marksman, whose probability of hitting a moving target is 0.6, fires three shots. 
Suppose the shots are independent. 


(a) What is the probability the target is hit? 


(b) How many shots must be fired to make the probability at least 0.99 that the target 
will be hit? 
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A box contains 6 green and 11 yellow balls. Three are chosen at random. The first 
and third balls are yellow. Which method of sampling —with replacement or without 
replacement— gives the higher probability of this event? 


A box contains slips of paper numbered from | to m. One slip is drawn from the box; 
if itis 1, itis kept; otherwise, it is returned to the box. A second slip is drawn from the 
box. What is the probability the second slip is numbered 2? 


Three integers are selected at random from the set {1, 2, ... , 10}. What is the proba- 
bility the largest of these is 5? 


A pair of dice is rolled until a 5 or a 7 appears. What is the probability a 5 occurs first? 


The probability is 1 that a fisherman will say he had a good day when, in fact, he did, 

but the probability is only 0.6 that he will say he had a good day when, in fact, he did 

not. Only 1/4 of his fishing days are actually good days. What is the probability he had 

a good day if he says he had a good day? 

An inexperienced employee mistakenly samples n items from a lot of N items, with 

replacement. What is the probability the sample contains at least one duplicate? 

A roulette wheel has 38 slots— 18 red, 18 black, and 2 green (the house wins on green). 

Suppose the spins of the wheel are independent and that the wheel is fair. The wheel 

is spun twice and we know that at least one spin is green. What is the probability that 

both spins are green? 

A “rook” deck of cards consists of four suits of cards: red, green, black, and yellow, 

each suit having 14 cards. In addition, the deck has an uncolored “rook” card. A hand 

contains 14 cards. 

(a) How many different hands are possible? 

(b) How many hands have the rook card? 

(c) How many hands contain only two colors with equal numbers of cards of each 
color? 

(d) How many hands have at most three colors and no rook card? 

Find the probability a poker hand contains 3 of a kind (exactly 3 cards of one face value 

and 2 cards of different face values). 

A box contains tags numbered 1, 2,...,n. Two tags are chosen without replacement. 

What is the probability they are consecutive integers? 

In how many different ways can n people be seated around a circular table? 

A production lot has 100 units of which 25 are known to be defective. A random sample 

of 4 units is chosen without replacement. What is the probability that the sample will 

contain no more than 2 defective units? 

A recent issue of a newspaper said that given a 5% probability of an unusual event in a 

1-year study, one should expect a 35% probability in a 7-year study. This is obviously 

faulty. What is the correct probability? 

Independent events A and B have probabilities p, and pz, respectively. Show that the 

probability of either two successes or two failures in two trials has probability 1/2 if 

and only if at least one of p, and pz is 1/2. 
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Chapter 2 


Discrete Random Variables and 
Probability Distributions 


At this point, we have considered discrete sample spaces and we have derived theorems 
concerning probabilities for any discrete sample space and some of the events within it. 
Often, however, events are most easily described by performing some operation on the 
sample points. For example, if two dice are tossed, we might consider the sum showing on 
the two dice; but when we find the sum, we have operated on the sample point seen. Other 
operations, as we will see, are commonly encountered. 

We want to consider some properties of the sum; we start with the sample space. In this 
example, a natural sample space shows the result on each die and, if the dice are fair, leads 
to equally likely sample points. Then, the sample space consists of the 36 points in S, : 


S, = {d, D, C, 2), .... 1, 6), (2, 1, ..., (6, 6) }. 


These points are shown in Figure 2.1. 
If we consider the sum on the two dice, then a sample space 


So = {2,3,4,5, 6, 7, 8,9, 10,.11,12} 


might be considered, but now the sample points are not equally likely. 
We call the sum in this example a random variable. 


Definition: A random variable is a real-valued function defined on the points of a sample 
space. 


Various functions occur commonly and we will be interested in a variety of them; sums 
are among the most interesting of these functions as we will see. We will soon determine 
the probabilities of various sums, but the determination of these is probably evident now 
to the reader. We first need, for this problem as well as for others, some ideas and some 
notation. 


2.1 RANDOM VARIABLES 


Since we have considered only discrete sample spaces to this point, we discuss discrete 
random variables in this chapter. 


Probability: An Introduction with Statistical Applications, Second Edition. John J. Kinney. 
© 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc. 
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First, consider another example. It is convenient to let X denote the number of times 
an examination is attempted until it is passed. X in this case denotes a random variable; 
we will use capital letters to denote random variables and small letters to denote values of 
random variables. Following are some of the infinite sample space, indicating the value of 
X,x at each point. 


Event x 
P 1 
FP 2. 

FFP 3 

FFFP 4 


Clearly, we see that the event “X = 3” is equivalent to the event “FFP” and so their proba- 
bilities must be equal. Therefore, 


P(X = 3) = P(FFP) = . 


The terminology “random variable” is curious since we could, in the earlier example, define 
a variable, say Y, to be 6 regardless of the outcome of the experiment. Y would carry 
no information whatsoever, and it would be neither random nor variable! There are other 
curiosities with terminology in probability theory as well, but they have become, alas, stan- 
dard in the field and so we accept them. What we call here a “random variable” is in reality 
a function whose domain is the sample space and whose range is the real line. The random 
variable here, as in all cases, provides a mapping from the sample space to the real line. 
While being technically incorrect, the phrase “random variable” seems to convey the cor- 
rect idea. This perhaps becomes a bit more clear when we use functional notation to define 
a function f(x) to be 
f(x) = P(X = x), 


where x denotes a value of the random variable X. 

In the earlier example, we could then write f(3) = 1/8. 

The function f(x) is called a probability distribution function (abbreviated as pdf) for 
the random variable X. 

Since probabilities must be nonnegative and since the probabilities must sum to 1, we 
see that 


[1] f(x) = 0 and 
[2] ic = 1 where Sis the sample space. 
Ss 


We turn now to some examples of random variables. 


Example 2.1.1 


Throw a fair die once and let X denote the result. The random variable X can assume the 
values 1, 2, 3, 4,5, 6, and so 
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| tore ae G 
P(X =x)=< 6 ; 
0, otherwise 


A graph of this function is of course flat; it is shown in Figure 2.1. This is an example of a 
discrete uniform probability distribution. 

The use of a computer algebra system for sampling from this distribution is explained 
in Appendix A. 


1 


3 
= 
a1 
) 6 
oO 

1 2 3 4 5 6 Figure 2.1 Discrete uniform proba- 
Face bility distribution. 

Example 2.1.2 


In the previous example, the die is fair, so now we consider an unfair die. In particular, 
could the die be weighted so that the probability a face appears is proportional to the face? 

Suppose that X denotes the face that appears and let P(X = x) = k - x where k denotes 
the constant of proportionality. The probability distribution function is then 


kifx=1 
2k if x = 2 
PX=x= 3k if x =3 
4k ifx=4 
Sk ifx =5 
6k ifx =6 


The sum of these probabilities must be 1, so 
k+2k+3k+4k+5k+6k=1, 


hence k = 1/21 and the weighting is possible. 
The probability distribution function is then 


x 
— x=1,2,3,4,5,6 
P(X =x) = ¢ 21 
0 otherwise 


A procedure for selecting a random sample from this distribution is explained in 
Appendix A. 
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Example 2.1.3 


Now we return to the experiment consisting of throwing two fair dice. We want to investi- 
gate the probabilities of the various sums that can occur. Let the random variable X denote 
the sum that appears. Then, for example, 


P(X = 5) = P[C, 4) or (2, 3) or (3, 2) or (4, 1)] 
4 1 


— 3600«9 
So we have determined the probability of one sum. Others can be determined on a simi- 
lar way. 

The experiment could then be described by giving all the values for the probability 
distribution function (or pdf), P(X = x), where, as earlier, x denotes a value for the random 
variable X, as we saw in Example 2.1.2. 

In this example, it is easy to find that 


xz ifx=2or 12 


eer ah 

36 

= ifx=4 or 10 
PX =x)= = ifx=5 0rd 

> ifx=60r8 

Dieneg 

36 


O otherwise 
We see that 
PX =x) = P(X = 14-y= 


x-1 


for x = 2,3,4,5,6,7, and 


P(X = x) = 0 otherwise. 


A graph of this function shows a tent-like shape shown in Figure 2.2. 


0.175 


Probability 


2 3 4 5 6 7 8 9 10 11 12 


Figure 2.2 Sums on two fair dice. 
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The sums when two dice are thrown then behave quite differently from the behavior of 
the individual dice. In fact, we note that if the random variable X, denotes the result showing 
on the first die and the random variable X, denotes the result showing on the second die, 
then X = X, + X,. The random variable X can be expressed as a sum of random variables. 
While X, and X, are uniform, X most decidedly is not uniform. There is a theoretical reason 
for this behavior, which will be discussed in a later chapter. It is sufficient to note here that 
this is, in fact, not unusual, but very typical behavior for a sum of random variables. 

A natural inquiry at this point is, “What is the probability distribution of the sum on 
three fair dice?” 

It is more difficult to work out the distribution here than it was for two dice. Although 
we will show another solution later, we give one approach to the problem at this time. 
Consider, for example, a sum of 10 on three dice. The sum could have arisen from these 
combinations of results showing on the individual dice (which do not indicate which die 


showed which face): 
(2, 2, 6), (3, 3, 4), (2, 4, 4), 


(3, 1, 6), (3, 2, 5), (5, 1,4). 
Each of the first three of these combinations could occur in three different orders (corre- 


sponding to the three different dice), while each of the last three could occur in six different 
orders. This gives a total of 27 possibilities, each of which has probability ae Therefore, 


P(X = 10) = ily A similar process could be followed for other values of the sum; the 
complete probability distribution can be found to be 


az ifx=3ori8 
Sk 
— ifx=4orl7 
216 
a ifx= 5 or 16 
wz ifx=6orls 
PX=xy=4 15 5 
— ifx=7or 14 
216 
a ifx= 8 0r3 
Cae era 
216 
27 


eT 10 or 11 


0 otherwise 


A computer algebra system may also be used to find the probability distribution for X. 
Many systems will give all the permutations, each of which may be summed and the relative 
frequencies recorded. This is shown in Appendix A. There are other methods that can be 
used to solve the problem; one of these will be discussed in Chapter 4. 

A graph of this function is shown in Figure 2.3. It begins to show what we will call 
a normal probability distribution shape. As the number of dice increases, the “curve” the 
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eye sees smooths out to resemble a normal probability distribution; the distribution for 6 
or more dice is remarkably close to the normal distribution. We will discuss the normal 
distribution in Chapter 3. 


0.12, 4 


0.1 | 


Probability 


3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 
Sum 


Figure 2.3 Sums on three fair dice. 


Example 2.1.4 


We saw in Example 2.1.2 that a single die could be loaded so that the probability of the 
occurrence of a face is proportional to the face. Can we load a die so that when the die is 
thrown twice the probability of a sum is proportional to the sum? 

If P(X = i) is denoted by P;, fori = 1, 2,3, 4,5, 6, and if k is the constant of proportion- 
ality, then Pe = 2k, 2P,P, = 3k, 2P,;P3 + Pp = 4k, and so on, together with the restriction 
that ye FP; = |, giving a system of 12 equations in 7 unknowns. Unfortunately, this set of 
equations has no solution, however, so we cannot load the die in the manner suggested. 


Example 2.1.5 


Let us look now at the sum when two loaded dice are thrown. First, let each die be loaded 
so that the probability a face occurs is proportional to that face, as is Example 2.1.2. The 
sample space of 36 points can be used to determine the probabilities of the various sums. 
Figure 2.4 shows these probabilities. We see that the symmetry we noticed in Figures 2.1 
and 2.3 is now gone. 

Now suppose one die is loaded so that the probability a face appears is proportional to 
that face while a second die is loaded so that the probability face i appears is proportional 
to7—i, i= 1,2,...,6. The probabilities of various sums are then shown in Figure 2.5. Now 
symmetry around x = 7 has returned. 

The appearance, once more, of the normal-like shape is striking. The reader with access 
to a computer algebra system may want to find the probability distribution of the sums on 
four dice, two loaded in each manner as in this example. The result is remarkably normal. 
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Figure 2.4 Sums on two similarly loaded dice. 
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Figure 2.5 Sums on two differently loaded dice. 


Example 2.1.6 


Sample spaces in the examples in this chapter so far have been finite. Our final example 
involves a countably infinite sample space. Consider observing single births until a girl is 
born. Let the random variable X denote the number of births necessary. Assuming the births 
to be independent, 


px=a=(5). x= 1,2.3.... 


To check that P(X = x) is a probability distribution, note that P(X = x) > 0 for all x. The 
sum of all the probabilities is 


s= Yra=s)= Crete 


To calculate this sum, note that 
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Subtracting the second series from the first series gives 


S= 1. 


Another way to sum the series is to recognize that it is an infinite geometric series of 
the form 
S=a+tar+ar’ +... andthe sum of this series is known to be 


S=—* ~ df lA <1, 
l-r 


In this case, a is , and r is also 7 so the sum is 1. 

Here, X is called a geometric random variable. A graph of P(X =x) appears in 
Figure 2.6. 

Since P(X = x+1)= (5) P(X = x), the probabilities decline rapidly in size. 


Probability 
[o) [o) i=) 
wo aN a 


© 
ho 
. 


123 45 67 8 9 1011 12 13 14 
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Figure 2.6 Geometric distribution. 


2.2 DISTRIBUTION FUNCTIONS 


Another function often useful in probability problems is called the distribution function. 
For a discrete random variable, we denote this function by F(x) where 


F(x) = P(X < x), so 


F(x) = io 


t<x 


F(x) is also known as a cumulative distribution function (abbreviated cdf) since it accu- 
mulates probabilities. Note the distinction now between f(x), the probability distribution 
function (pdf), and F(x), the cumulative distribution function (cdf). 
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In Chapter 1, we used the reliability of a component where R(t) = P(T > t) so 
R(t) = 1- FO), 


establishing a relationship between R(f) and the distribution function. 


Example 2.2.1 


For the fair die whose probability distribution function is given in Example 2.1.1, we find 


F(1) = 1/6, F(2) = 2/6, F(3) = 3/6, F(4) = 4/6, F(5) = 5/6, F(6) = 1. 


It is also customary to show this function for any value of the random variable X. Here, 
for example, F'(3.4) = P(X < 3.4) = 3/6. Since F(x) is defined for any value of X, we draw 
a continuous graph, unlike the graph of the probability distribution function. We see that in 


this case 
0, ifx <1 


‘I 

=" 
as 
lA 
& 

A 
tv 


if2<x<3 


F(x) =4 2, if3<x<4 


if4<x<5 


,ifs<x<6 
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_ 


if6<x 


A graph of this function is shown in Figure 2.7. It is a series of step functions, since, 
when f(x) is scanned from the right, F(x) can increase only at those points where f(x) is 
not zero. 
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Figure 2.7 Distribution function for one toss of a fair die. 
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It is clear from the definition of F(x) and from the fact that probabilities are in the 
interval [0,1] that 


0 < F(x) < | and that 
F(a) > F(b) ifa> b. 
It is also true, for discrete random variables taking integer values, that 
Pia<X <b)=P(X <b)—- P(X < a) = F(b) - F(a 1). 
Individual probabilities, say P(X = a), can be found by 
P(X =a) = P(X <a) —- P(X < a—- 1) = F(a) — F(a— 1). 


These probabilities are then the size of the “steps” in the distribution function. 


EXERCISES 2.2 


1. Suppose the probability distribution function for a random variable, X, is P(X = x) = 
1/5 for x = 1,2, 3,4,5. 
(a) Find P(X > 3). 
(b) Find P(X is even). 
2. Draw a graph of the cumulative distribution function in problem 1. 
3. A fair coin is tossed four times. 
(a) Show a sample space for the experiment and assign probabilities to the sample 
points. 


(b) Suppose a count of the total number of heads (X) and the total number of tails (Y) 
is made after each toss. What is the probability that X always exceeds Y? 
(c) What is the probability, after four tosses, that X is even if we know that Y > 1? 

4. A single expensive electronic part is to be manufactured, but the manufacture of a 
successful part is not guaranteed. The first attempt costs $100 and has a 0.7 probabil- 
ity of success. Each attempt thereafter costs $60 and has a 0.9 probability of success. 
The outcomes of various attempts are independent, but at most three attempts can be 
made at successful manufacture. The finished part sells for $500. Find the probability 
distribution for N, the net profit. 

5. An automobile dealer has found that X, the number of cars customers buy each week, 
follows the probability distribution 


ko? 
fe) =4 al 


0, otherwise 


, x= 1,2,3,4. 


(a) Find k. 
(b) Find the probability the dealer sells at least two cars in a week. 
(c) Find F(x), the cumulative distribution function. 
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. Job interviews last 1/2 hour. The interviewer knows that the probability an applicant is 


qualified for the job is 0.8. The first person interviewed who is qualified is selected for 
the job. If the qualifications of any one applicant is independent of the qualifications 
of any other applicant, what is the probability that 2 hours is sufficient time to select a 
person for the job? 


. Verify the probability distribution for the sum on three fair dice as given in Example 


2.13; 


5 
- (a) Since (G + ; = | and since each term in the binomial expansion of (5 + ;) 


is greater than 0, it follows that the individual terms in the binomial expansion 
are probabilities. Suggest an experiment and a sample space for which these terms 
represent probabilities of the sample points. 


(b) Answer part (a) for (p+q)", g=1—p,0<p<l. 


. Two loaded dice are tossed. Each die is loaded so that the probability a face, i, appears 


is proportional to 7 — i. Find the probability distribution for the sum that appears. Draw 
a graph of the probability distribution function. 


. Suppose that X is a random variable giving the number of tosses necessary for a fair 


coin to turn up heads. Find the probability that X is even. 


. The random variable Y has the probability distribution g(y) = - if y = 2,3,4, or5. 


Find G(y), the distribution function for Y. 
1 x 


. Find the distribution function for the geometric distribution f(x) = is) <= 


2 
T2532 


A random variable, X, has the distribution function 


F(x) = 


Find the probability distribution function, f(x). 


A random variable X is defined on the integers 0, 1, 2, 3, ... and has distribution 
function F(x). Find expressions, in terms of F(x), for the following: 


(a) Pia< X <b) 
(b) Pia< X <b) 
(c) Pla<X <b) 
(d) Pa<X <b). 
If f(x) = 1/n, x = 1, 2,3,...,n (so that each value of X has the same probability), then 


X is called a discrete uniform random variable. Find the distribution function for this 
random variable. 
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2.3 EXPECTED VALUES OF DISCRETE RANDOM 
VARIABLES 


Expected Value of a Discrete Random Variable 


Random variables are easily distinguished by their probability distribution functions. They 
are also often characterized or described by measures that summarize these distributions. 
Usually, “average” values, or measures of centrality, and some measure of their dispersion, 
or variability, are found as values characteristic of the distribution. 

We begin with the definition of an average value for a discrete random variable, X, 
denoted by E(X), or 4, which we will call the expectation, or expected value, or mean, or 
mean value (all of these terms are in common usage) of X: 


Definition: E(X) = 1, = be. . P(X =), 
x 
provided the sum converges, where the summation occurs over all the discrete values of 
the random variable, X. Note that each value of the random variable X is weighted by its 
probability in the sum. 

The provision that the sum be convergent cautions us that the sum may, indeed, be 
infinite. There are random variables, otherwise seemingly well behaved, which have no 
mean value. 

This definition is, in reality, a simple extension of what the reader would recognize as 
an average value. Consider an example: 


Example 2.3.1 


A student has examination grades of 82, 91, 79, and 96 in a course in probability. We would 
no doubt calculate the average grade as 


82+91+79+96 _ 


87. 
4 
This could also be calculated as 
1 1 1 1 
2--4+91-—4+79-—+96- — = 87, 
8 4 ¥ 4 : 4 = 4 


where the examination scores have now been equally weighted. Should the instructor decide 
to weight the fourth examination three times as much as any one of the other examinations, 
this simply changes the weights and the average examination grade is then 


1 1 1 3 
21-0] «2470+ >406+= = 90, 
8282 £0 419 496+ 7 = 90 


So the idea of adding scores multiplied by their probabilities is not a new one. This is exactly 
what we do when we calculate E(X). 
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Example 2.3.2 


If a fair die is thrown once, as in Example 2.1.1, the average result is 


1 1 1 9 
fe eS et eee eee eee eee 
Mx 6 6 6 6 6 6 2 


So we recognize 7/2, or 3.5, as the average result, although 3.5 is not a possible value 
for the face showing on the die. What is the meaning of this? The interpretation is as follows: 
if we threw a fair die a large number of times, we would expect each of the faces from | to 
6 to occur about 1/6th of the time, so the average result would be given by y,. We could, of 
course, expect some deviation from this result in actual practice, the size of the deviation 
decreases as the number of tosses of the die increases. Later we will see that a deviation of 
more than about 0.11 in the average is highly unlikely in 1000 tosses of the die, that is, the 
average is almost certain to fall in the interval from 3.39 to 3.61. If the deviation is more 
than 0.11, we would no doubt conclude that the die is an unfair one. 


Example 2.3.3 


What is the average result on the loaded die where P(X = 1) = i/21, fori = 1,2,3,4,5,6? 


Se ae ee ee ee aera 
Here, E(X) = 1 5 +2 Tis re Ties Ts Tih 


Example 2.3.4 


In Example 2.1.3, we determined the probability distribution for X, the sum showing on 
two fair dice. Then, we find 


BOOS 289 Pete a et eee 


Now let X, denote the face showing on the first die and let X, denote the face showing on 
the second die. We found in Example 2.3.2 that E(X;) = z, for i = 1,2. We note here that 


E(X) = E(X,) + E(X)), 


so that the expectation of the sum is the sum of the expectations of the sum’s compo- 
nents; this is in fact generally true and so is no coincidence. We will discuss this further in 
Chapter 5. 


Example 2.3.5 


Sometimes, the calculation of an expected value will involve an infinite series. Suppose 
we toss a coin, loaded to come up heads with probability p, until heads occur. Since the 
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tosses are independent, and since the event, “First head on toss x,” is equivalent to x — | 
tails followed by heads, it follows that 


P(X =x)=q"'p, x =1,2,3,..., where g = 1—p. 


We check first that YP = x) = 1. Here, 


DiPX=x2n=pt+q-ptg-p+q-pt+ a 
x 


=p:(l+q+@+qt...) 


Then, 


E(X) = ix: Go 'p=p+2-q-pt3-G -pt4-G-pt ot 


x=1 


To simplify this, notice that 
q:E(X)=q-pt2-G-pt+3-q-pt4-q-pt... 
By subtracting gq - E(X) from E(X), we find that 
E(X)—q-EX)=ptq-ptq@ -ptq-pt+q pt... 


where the right-hand side is YP =x)=1.So 


(1 — q)- E(X) = 1, hence 


EX) =i. 
P 


(The reader is cautioned that the “trick” above for summing the series is valid only because 
the series is absolutely convergent. E(X) could also be found by integrating, with respect to 
q, the series for E(X) term by term.) 

With a fair coin, then, since p = 1/2, an average of two tosses is necessary to find the 
first occurrence of heads. Since P(X = x) involves a geometric series, X here, as in Example 
2.1.6, is often called a geometric random variable. 

Mean values generally show a central value for the random variable. Now we turn to a 
discussion of the dispersion, or variability, of the random variable. 
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Variance of a Random Variable 


Figure 2.8 shows two random variables with the same mean value, 4 = 3. These graphs are 
continuous; the reader may regard them as idealized discrete random variables. Continuous 
random variables will be discussed in Chapter 3. If we did not know yp and wanted to esti- 
mate yw by selecting an observation from one of these probability distributions, we would 
no doubt choose Y since the values of Y are less disperse and generally closer to w than 
those for X. 

There are many ways to measure the fact that Y is less disperse than X. We could look at 
the range (the largest possible value minus the smallest possible value); another possibility 
is to calculate the deviation of each value of X from y and then calculate the average value of 
these deviations from the mean, E(X — y). This, however, is 0 for any random variable and 
hence carries absolutely no information whatsoever regarding X. Here is a demonstration 
that this is so: 


E(X - p) = Via - pw) PX =x) 
x 
= Dix PX =n -p- PPX =H) 
x x 
=yH-p=0. 
So the positive deviations from the mean exactly compensate for the negative deviations. 
One way to avoid this is to consider the mean deviation, E|X — |, but this is not com- 
monly done. Yet another way to prevent the positive deviations from compensating for the 


negative deviations is to square each value of X — yw and then sum the result. This is the 
usual solution; we call the result the variance, denoted by o2, which we define as 


Definition: 


o° = Var(X) = E(X — n)* so 
c= Ma- w? P(X =x), (2.1) 


-2 0 2 4 6 8 


Figure 2.8 Two random variables with the same mean value. 
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provided the sum converges, and where the summation is over all the possible values of X. 

The quantity o” is then a weighted average of the squared deviations of the values of X 
from its mean value. The variance may appear to be much more complex than the range or 
mean deviation. This is true, but the variance also has remarkable properties that we cannot 
describe now and which do not hold for the range or for the mean deviation. 


Example 2.3.6 


Consider the random variable X with probability distribution function: 


| 
2 
fo) = zitr=2 
zifx=3 
Here we Sye ish eo. 249.222 56 
: a eel 3 6. 3° 
ay. a). Ro ft 5S 
sori nota (1-8) be (2-3) 14 (0-3) ded 
( H) o 3 77 3 3+ : 2-5 


Before turning to some more examples, we show another formula for o?. This formula is 


often very useful. 
Expand (2.1) as follows: 


= YG _ yy - P(X =x) 
= Yi - 4)? -f@) 
= Die? = wrt 0?) FQ) 


=)? -f@)- yx ft? 


since )) f(x) = 1. 
Now )).x-f(®) = pw, so 
=r -f~- (2.2) 


Soo? = E(X?) — p? = EQ?) — [EQOY. 


Formula (2.2) is often easier to use for computational purposes than formula (2.1). o is 
called the standard deviation of X. 


www.it-ebooks.info 


2.3 Expected Values of Discrete Random Variables 


Example 2.3.7 
Refer again to throwing a single die, as in Examples 2.1.1 and 2.2.2. We calculate 


72.1492. 1 ,322.la yn .149.14, 2.1.91 
E(X*) = 1 gt? ae. ieee Zr 5 +6 6 = % 80 that 


Example 2.3.8 


77 


What is the variance of the geometric random variable whose probability distribution func- 


tion is P(X = x) = q*!-p, x = 1,2,3, ...? 


Starting with o? = E(X*) — ?, since we know that y = - we only need to compute 


E(X?): 


E(X? = ¥xq*'p = pl? +274 + 3°@" + ...) 


x=1 


from which no easily seen pattern emerges. 
Another thought is to consider E[X(X — 1)]. If we write 


EIX(X -—D] = Ge — x) - P(X =x), we see that 
x=1 
E[X(X —1)] = ye P(X =x)- yx -P(X=x) or 


x=1 x=1 


E(X? — X) = E(X’) — E(X). 


So if we know E[X(X — 1)], we can find E(X”) and hence calculate o?. In this example, a 


trick will help as it did in determining E(X): 
E[X(X - 1] =1 -O-pt2-1-q-pt+3:2-¢ -pt4:3-¢ “DF ins 
so multiplying through by g, we have 
q: E[X(X — 1)] =2-1-q-pt+3-2-¢ -pt4-3-q'-pt as 
Subtract the second series from the first series and, since p = | — q, it follows that 


p-E(X(X-1))=2-q-p+4-¢ -p+6-¢@-pt... 
= 2q(1p + 2gp + 3q’p + ...) 
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Thus, 
2q 
p-E[X(X — 1)) = 2¢- E(X) = - 


2q 
So E[X(X - I] = 5, 
2) 


2 
and B(X2) = 4 + a 
pp 


2q 1 #1 = @ 
at 2 2" 
Pp P 


giving o* = 


The value of the variance is quite difficult to interpret at this point but, as we proceed, we 
will find more and more uses for the variance. Patience is requested of the reader now, with 
the promise that these calculations are in fact useful and meaningful. We pause to consider 
the question, “Does o measure variability?” We can show a general result, albeit a very 
crude one, in the following inequality. 


Tchebycheff’s Inequality 


Theorem 1: Suppose the random variable X has mean py and standard deviation o. Choose 
a positive quantity, k. Then, 


I 
P(X— | <k-0) 21-5. 


Tchebycheff’s inequality gives a lower bound on the probability a value of X is within k -o 
units of the mean, y. 
Before offering a proof, we consider some special cases. If k = 2, the inequality is 


1 3 
P = <2- >l-—=-, 
(IX-z]s2-0)21-Z= 5 


so 3/4 of any probability distribution lies within two standard deviations, that is, 2o units 


of the mean while, if k = 3, the inequality states that 


1 8 
PK — pn] S3-0)2 1-55 = 9; 


showing that 8/9 of any probability distribution lies within 30 units of the mean. We will 
see later that if the specific distribution is known, these inequalities can be sharpened con- 
siderably. Now we show a proof. 


Proof: Let P(X = x) = f(x). Consider two sets of points: 


A = {x||x-— uv] > k-o} and 
B= {x||x-p| <k-o}. 


We could then write the variance as 


a = Va- pw)? - f+ Ve- wo? -f@. 
xEA 


xEB 
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Now for every point x in A, replace |x — u| by k-o and in B replace |x — y| by 0. The 
crudity of the result is now evident! So 


0 > Vk ofa) + YO? - £0. 
xEA 


xEB 


Since Ye, f(x) = P(A) = P(X — p| > k-0), 


o? > k?-o? - P(\X — | >k-o) from which we conclude that 


1 
P(X — p| 2 k-0) < 5 oF 


1 
P(IX— | Sk-o)21- =. 


While the theorem is far from precise, it does verify that as we move farther away from the 
mean, in terms of standard deviations, the more of the probability distribution we cover; 
hence o is indeed a measure of variability. 


EXERCISES 2.3 


1. 


If X is the outcome when a loaded die with P(X = x) = x/21 for x = 1, 2,3,4,5, 6, find 
pand o°. 


. Verify Tchebycheff’s inequality in problem 1. 


. A small manufacturing firm sells | machine per month with probability 0.3; it sells 


2 machines per month with probability 0.1; it never sells more than 2 machines per 
month. If X represents the number of machines sold per month, 
(a) find the mean and variance of X. 


(b) If the monthly profit is 2X* + 3X + 1 (in thousands of dollars), find the expected 
monthly profit. 


. Bolts are packaged in boxes so that the mean number of bolts per box is 100 with 


standard deviation 3. Use Tchebycheff’s inequality to find a bound on the probability 
that the box has between 95 and 105 bolts. 


. Graduates of a distinguished undergraduate mathematics program received graduate 


school fellowships as follows: 20% received $10,000; 10% received $12,000; 30% 
received $14,000; 30% received $13,000; 5% received $15,000; and 5% received 
$17,000. Find the mean and the variance of the value of a graduate fellowship. 


. A fair coin is tossed four times; let X denote the number of heads that occur. Find the 


mean and variance of X. 


. A batch of 15 electric motors actually contains three defective motors. An inspector 


chooses 3 (without replacement). Find the mean and variance of X, the number of 
defective motors in the sample. 


. Acoin, loaded to show heads with probability 2/3, is tossed until heads appear or until 


5 tosses have been made. Let X denote the number of tosses made. Find the mean and 
variance of X. 


. Suppose X is a discrete uniform random variable so that f(x) = 1/n,x = 1,2,3, ... ,7. 


Find the mean and variance of X. 
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10. 


11. 


12. 


13. 


14. 


15. 


In problem 5, suppose the batch of motors is accepted if no more than | defective 
motor is in the sample. If each motor costs $100 to manufacture, how much should the 
manufacturer charge for each motor in order to make the expected profit for the batch 
be $200? 


A physicist makes several independent measurements of the specific gravity of a sub- 
stance. The limitations of his equipment are such that the standard deviation of each 
measurement is o units. Suppose yz is the true specific gravity of the substance. Approx- 
imate the probability that one of the measurements is within 50/4 units of y. 


A manufacturer ships parts in lots of 1000 and makes a profit of $50 per lot sold. The 
purchaser, however, subjects the product to a sampling inspection plan as follows: 10 
parts are selected at random. If none of these parts is defective, the lot is purchased; 
if 1 part is defective, the manufacturer returns $10 to the buyer; if 2 or more parts are 
found to be defective, the entire lot is returned at a net loss of $25 to the manufacturer. 
What is the manufacturer’s expected profit if 10% of the parts are defective? (Assume 
that the sampling is done with replacement.) 


In a lot of six batteries, one is worn out. A technician tests the batteries one at a time 
until the worn out battery is found. Tested batteries are put aside, but after every third 
test the tester takes a break and another worker, unaware of the test, returns one of the 
tested batteries to the set of batteries not yet tested. 


(a) Find the probability distribution for X, the number of tests required to identify the 
worn out battery. 


(b) Assume the first test of each set of three tests costs $5, and that each of the next 
two tests in each set of three tests costs $2. Find the increase in the expected cost 
of locating the worn out battery due to the unaware worker. 


A carnival game consists of hitting a lever with a sledge hammer to propel a weight 
upward toward a bell. Because the hammer is quite heavy, the chance of ringing the bell 
declines with the number of attempts; in particular, the probability of ringing the bell 
on the ith attempt is (3/4). For a fee, the carnival sells you the privilege of swinging the 
hammer until the bell rings or until you have made three attempts, whichever occurs 
first. 


(a) Find the probability distribution of X, the number of hits taken. 

(b) The prize for ringing the bell on the ith try is $(4 — 1), i= 1, 2, 3. How much should 
the carnival charge for playing the game if it wants an expected profit of $1 per 
customer? 


Suppose X is a random variable defined on the points x = 0, 1, 2,3, ... Calculate 
YP > x). 
x=0 


There are many very important specific discrete probability distribution functions that 
arise in practical applications. Having established some general properties, we now 
turn to discussions of several of the most important of these distributions. 

Occasionally, random variables in apparently different situations actually arise 
from common assumptions and hence lead to the same probability distribution func- 
tion. We now investigate some of these special circumstances and the probability dis- 
tribution functions which result. 
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2.4 BINOMIAL DISTRIBUTION 


Among all discrete probability distribution functions, the most commonly occurring one, 
arising in a great variety of applications, is called the binomial probability distribution 
function. It is attributed to James Bernoulli. 

Consider an experiment where, on each trial of the experiment, one of only two out- 
comes occurs, which we describe as success (S) or failure (F). For example, a manufactured 
part is either good or does not meet specifications; a student’s examination score is pass- 
ing or it is not; a team wins a basketball game or it does not—these are some examples 
of binomial variables and the reader can no doubt think of many more. One of these out- 
comes can be associated with success and the other with failure; it does not matter which is 
which. 

In addition to the restriction that there be two and only two outcomes on each trial of 
the experiment, suppose further that the trials are independent and that the probabilities of 
success or failure at each trial remain constant from trial to trial and do not change with 
subsequent performances of the experiment. 

The individual trials of such an experiment are often called Bernoulli trials. 

Consider, as a specific example, 5 independent trials with probability 2 of success at 
any trial. Then, if interest centers on the occurrence of exactly 3 successes, we note that 
exactly 3 successes can occur in 10 different ways: 


SSSFF, SSFSF, SFSSF, FSSSF, SFSFS, 
SSFFS, FSSFS, SFFSS, FSFSS, FFSSS. 


3 2 
There are () = 10 of these mutually exclusive orders. Each has probability (5) : (5 ) 


so 
3 2 
P(Exactly 3 S's in 5 trials) = (3) (=) , (5) = x 


Now return to the general situation. Let the probabilities be P(S) = p and P(F) = q = 1 — p, 
and let the random variable X denote the number of successes in n trials of the experiment. 
Any specific sequence of exactly x successes and n — x failures has probability p* - q"~. 
The successes in such a sequence can occur at (*) positions so, since the sequences are 
mutually exclusive, 


P(X=x = (") pd’, x=0,1,2,..n, (2.3) 


giving the probability distribution function for a binomial random variable. 

Although the binomial random variable occurs in many different situations, a per- 
fect model for any binomial situation is that of observing the number of heads when a 
coin loaded so that the probability of heads is p and that of tails is g = 1 — p is tossed n 
times. 

Now does (2.3) define a probability distribution? Since P(X =x)>0O and 
Dro P(X = x) = Yo (1) p* +g" = (G+ p)” = | by the binomial theorem, we conclude 
that (2.3) defines a probability distribution. 

It is interesting to note then that individual terms in the binomial expansion of (q¢ + p)”, 
if p + q = 1, represent binomial probabilities. 
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Example 2.4.1 


A student has no knowledge whatsoever of the material to be tested on a true—false exam- 
ination, so he flips a fair coin in order to determine his response to each question. What is 
the probability he scores at least 60% on a ten-item examination? 

Here, the binomial variable X the number of correct responses, has n = 10 and p = q = 


1/2. We need 
10 
10 1 
P(X > 6) = (=) (5) - 
(X > 6) pes ails 
193 
Now we find that P(X > 6) = 310 = 0.376953. 


The above-mentioned calculations can easily be done with a pocket computer. If we want 
to investigate the probability that at least 60% of the questions were answered correctly as 
the number of items on the examination increases, then use of a computer algebra system 
is recommended for aiding in the calculation. Many computer algebra systems contain the 
binomial probability distribution as a defined probability distribution; for other systems the 
probability distribution function may be entered directly. The following results can be found 
where n is the number of trials and P is the probability of at least 60% correct: 


n 10 40 80 100 
P 0.376953 0.134094 0.0464559 0.028444 


Clearly, guessing is not a sensible strategy on a test with a large number of items. 


Example 2.4.2 


Graphs of P(X =x) = (*) p*-q"™~ for x =0,1,2,...,n are interesting. The graphs of 
P(X = x) forn = 10 and also for n = 100 with p = 1/2 in each case in shown in Figure 2.9. 
We see that each curve is bell-shaped or normal-like, and the distributions are symmetric 
about x = 5 and x = 50, respectively. 

Again we find the bell-shaped or normal appearance here, but the reader may wonder 
if the appearance is still normal for p # 1/2. Figure 2.10 shows a graph of P(X = x) for 
n = 50 and p = 3/4. This curve indicates that the bell shape survives even though p # 1/2. 
The maximum point on the curve has shifted to the right, however. 

We will discuss the reason for the normal appearance of the binomial distribution in 
the next chapter. Appendix A contains a procedure for selecting a sample from a binomial 
distribution and for simulating an experiment consisting of flipping a loaded coin. 


2.5 A RECURSION 


If a computer algebra system is not available, calculating values of PX = x) = (*) p-q'* 
can certainly become difficult, especially for large values of n and small values of p. 
In any event, (*) becomes large while p*-q’* becomes small. By calculating the 
ratio of successive terms we find an interesting result, which will aid in making these 
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0.25 


0.2 


Probability 
[o) 
a 


0.06 


0.04 


Probability 


0.02 


Qleett si Bes Figure 2.9 (a) Binomial distribu- 
34 37 40 43 46 49 52 55 58 61 64 tion, n = 10, p = 1/2. (b) Binomial 
(b) xX distribution, n = 100, p = 1/2. 


0.14 
0.12 
0.1 


0.08 


Probability 


0.06 
0.04 
0.02 


0 a . . Sag 
23 25 27 29 31 33 35 37 39 41 43 45 47 49 
x 
Figure 2.10 Binomial distribution, n = 50, p = 3/4 


calculations (and which has other interesting consequences as well). We divide P(X = x) by 
P(X =x-1): 


(") xX gn—Xx 
PXX=x) aye 4 
P(X a= 1) ( n ) p= . quot 


Hh 2 angi 


x-1 
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P(X = _ 
This can be simplified to ( s) n= xdl P 


P(X=x-1) x q 
opteye 2") 2 pes S28 ak: (2.4) 
x q 


Formula (2.4) is another example of a recursion since it expresses one value of a function, 
here P(X = x), in terms of another value of the function, here P(X = x — 1). Given a starting 
point and the recursion, any value of the function can be computed. In this case, since n 
failures have probability g”, P(X = 0) = q" is a natural starting value. We then find that 


P(X =0)=q", 
so 
pax =D=n-2 pan 2 g=(1) peg 
q q 
and 
px=2 ==) PP pany Caen pea 
q 


q 
_ [7 ope a pnae 


and so on, giving the expected result that P(X = x) = ) p\-q"*,x =0,1,...,n. So we can 
recover the probability distribution function from the recursion. 

Recursions can be easily programmed and recursions such as (2.4) are also of some 
interest for theoretical purposes. For example, consider locating the maximum, or most 
frequently occurring value, of P(X = x). 

If we require that P(X = x) > P(X = x-— 1), then, from (2.4), — ASI, 

This reduces to x < p- (m+ 1), so we can conclude that the value of X with the maxi- 
mum probability is X = |p- (n+ 1)| where |x| denotes the largest integer in x. 


The Mean and Variance of the Binomial 


The recursion (2.4) can be used to determine the mean and variance of a binomial random 
variable. Consider first uy = > a - P(X = x). Recursion (2.4) is 


PX=n=2 E P(X =x=—1),x= 1,2, 050. 
q 


Le ae 
x 
Multiplying through by x and summing from | to n gives 


Dp 
Vx PX =n = Min-(- D)-= PK =x-d), 
x+ P(X = x) In—(x-1)] . (x =x- 1), so 


x=1 x=1 


p=" en--PX=m]-£-V@-1)- PK =x- Yor 
q q & 
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-n-(1—p")-——-[u—n- P(X =n)] which reduces to 


This result makes a good deal of intuitive sense: if we toss a coin, loaded to come up heads 
with probability 3/4, 1000 times, we expect 1000 - : = 750 heads. So inv trials of a bino- 
mial experiment with p as the probability of success, we expect n - p successes. 

The variance can also be found using (2.4). We first calculate E(X?): 


n 


Bee) = Se -P(X=x)= yx ‘f- G1): 


x=1 x=] 


-P(X =x-1), 


hye E. S= D+) PA ax- 1-7 ig = 1 Pe =a- 1, 


Then since pe —1)-PX®=x-1)=y-—n- P(X =n) and since 


n 


Vix G1) PK =x-1) = Pi@- 1? + @- D)- PK =x- D), 


x=1 x=1 


it follows that 
E(X2) = p-(n— 1) + (np — np") +n- p+ (1—p") + r2- p™1, 


and this reduces to E(X”) = np*(n — 1) + np. Therefore, 


o* = E(X*) — [E(X)* = np?(n — 1) + np — (npy* = npg. 


Example 2.5.1 


We apply the above-mentioned results to a binomial experiment in which p = q = 1/2 
and n = 100 trials. Here, E(X) = uw =n-p=50 and o* =n-p-q=25. Tchebycheff’s 
inequality with k = 3 then gives 


Pin- p—k-s/n-p:-q<sX<n-ptk-4/n-p: qzi-5 


so P[I50-3-5<X <50+4+3-5]> 


ele 


P35 < X <65]> =. 


But we find exactly that 


65 
Y (“"") Cu = 0.99821, 


x=35 


verifying Tchebycheff’s inequality in this case. 
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EXERCISES 2.5 


1. 


10. 
11. 


Suppose a binomial random variable X assumes only the values 0 and | and P(X = 
1) = p. Verify the mean and variance of X directly. 


. For a binomial random variable with probability p and n = 5, find all the probabilities 


for the probability distribution function and draw a graph of them. 


. A test is conducted to determine the concentration of a chemical in a lawn weed killer, 


which will effectively kill dandelions. It is found that a given concentration of the chem- 
ical will kill on average 80% of the dandelions in 24 hours. A test is performed on 20 
dandelions. Find the probability that 
(a) exactly 14 are killed in 24 hours. 
(b) at least 10 are killed in 24 hours. 


. A fair die is rolled 240 times. Find the probability that the number of 2’s or 3’s is 


between 75 and 83, inclusive. 


. A manufacturer of dry cells actually makes two batteries that appear to be identical. 


Batteries of Type A last more than 600 hours with probability 0.30 and batteries of 
Type B last more than 600 hours with probability 0.40. 


(a) What is the probability that 5 out of 10 of Type A batteries last more than 600 
hours? 
(b) Of 50 Type B batteries, how many are expected to last at least 600 hours? 


(c) What is the probability that three Type A batteries have more batteries lasting 600 
hours than two Type B batteries? 


. X and Y play the following game. X tosses 2 fair coins and Y tosses 3. The player 


throwing the greater number of heads wins. In case of a tie, the throws are repeated 
until a winner is determined. 

(a) What is the probability that X wins on the first play? 

(b) What is the probability that X wins the game? 


. In a political race, it is known that 40% of the voters favor candidate C. In a random 


sample of 100 voters, what is the probability that 
(a) between 30 and 45 voters favor C? 
(b) exactly 36 voters favor C? 


. A gambling game is played as follows. A player, who pays $4 to play the game, tosses 


a fair coin five times. The player wins as many dollars as heads are tossed. 
(a) Find the probability distribution for N, the player’s net winnings. 
(b) Find the mean and variance of the player’s net winnings. 


. Ared die is fair and a green die is loaded so that the probability it comes up 6 is 1/10. 


(a) What is the probability of rolling exactly 3 sixes in 3 rolls with the red die? 
(b) What is the probability of at least 30 sixes in 100 rolls of the red die? 


(c) The green die is thrown five times and the red die is thrown four times. Find the 
probability that a total of 3 sixes occurs. 


What is the probability of one head twice in three tosses of four fair coins? 


A commuter’s drive to work includes seven stoplights. Assume the probability a light 
is red when the commuter reaches it is 0.20 and that the lights are far enough apart to 
operate independently. 
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19. 
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(a) If X is the number of red lights the commuter stops for, find the probability distri- 
bution function for X. 

(b) Find P(X > 5). 

(c) Find P(X > 5|X > 3). 

The probability of being able to log on a computer system from a remote terminal 

during a busy period is 0.7. Suppose that 10 independent attempts are made and that X 

denotes the number of successful attempts. 


(a) Write an expression for the probability distribution function, f(x). 

(b) Find P(X > 5). 

(c) Now suppose that Y represents the number of attempts up to and including the first 
successful attempt. Write an expression for the probability distribution function, 
sy). 

An experimental rocket is launched five times. The probability of a successful launch 

is 0.9. Let X denote the number of successful launches. A study has shown that the net 

cost of the experiment, in thousands of dollars, is 2 — 3X2. Find the expected net cost 
of the experiment. 

Twenty percent of the integrated circuit (IC) chips made in a plant are defective. 

Assume that a binomial model is appropriate. 

(a) Find the probability that at most 13 defective chips occur in a sample of 100. 

(b) Find the probability that two samples, each of size 100, will have a total of exactly 
26 defective chips. 

A coin, loaded to come up heads with probability 2/3, is tossed five times. If the number 

of heads is odd, the player is paid $5. If the number of heads is 2 or 4 the player wins 

nothing; if no heads occur, the player tosses the coin five more times and wins, in 
dollars, the number of heads thrown. If the game costs the player $3 to play, find the 
probability distribution of NV, his net winnings. 

(a) Show that the probability of being dealt a full house (3 cards of one value and 2 
cards of another value) in poker is about 0.0014. 

(b) Find the probability that in 1000 hands of poker, you will be dealt at least 2 full 
houses. 

An airline knows that 10% of the people holding reservations on a given flight will not 

appear. The plane holds 90 people. 

(a) If 95 reservations have been sold, find the probability that the airline will be able 
to accommodate everyone appearing for the flight. 

(b) How many reservations should be sold so that the airline can accommodate every- 
one who appears for the flight 99% of the time? 

The probability an individual seed of a certain type will germinate is 0.9. A nurseryman 

sells flats of this type of plant and wants to “guarantee” (with probability 0.99) that at 

least 100 plants in the flat will germinate. How many plants should he put in each flat? 

A coin with PH) = 1/2 is flipped four times, and then a coin with P(H) = 2/3 is tossed 

twice. What is the probability that a total of five heads occurs? 

- (a) Each of two persons tosses three fair coins. What is the probability that each gets 

the same number of heads? 
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(b) In part (a), what is the probability that X, + X, is odd, where X, is the number 
of heads the first person tosses and X, is the number of tosses the second person 
tosses? 


(c) Repeat part (a) if each person tosses n fair coins. Simplify the result as much as 
possible. 


21. Find the probability that more than 520 heads occur in 1000 tosses of a fair coin. 


22. How many times must a fair coin be tossed if the probability of obtaining at least 40 
heads is at least 0.95? 


23. Samples of 100 are selected each hour from a production process that produces items, 
20% of which are defective. 


(a) What is the probability that at most 15 defectives are found in an hour? 
(b) What is the probability that a total of 47 defectives is found in the first 2 hours? 


24. A small engineering college would like to have an entering class of 360 students. Past 
data indicate that 85% of those accepted actually enroll in the class. How many students 
should be accepted if the probability the class will be at least 360 is to be approximately 
0.95? 


25. A fair coin is tossed repeatedly. What is the probability the number of heads tossed 
reaches 6 before the number of tails tossed reaches 4? 


26. Evaluate the sums 


yx , (") p\-q"~* and ye -(-1)- @ pg 


x=0 x=0 


directly and use these to verify the formulas for and o” for the binomial distribution. 


[Note that to (2) -p*- (1. —p)"* = [p+ — py" = 1] 
27. In problem 6, show that the game is fair if X wins if he tosses at least as many heads as 
Y. 


2.6 SOME STATISTICAL CONSIDERATIONS 


We pause here and in the next section to show some statistical applications of the proba- 
bility theory we have developed so far. From time to time in this book, we will show some 
applications of probability theory to statistics and the statistical analysis of data as well as 
to other applied situations; this is our first consideration of statistical problems. 

From the previous section, we know what can happen when n observations are taken 
from a binomial distribution with known parameter p. Generally, however, p is unknown. 
We might, for example, be interested in the proportion of unacceptable items arising from 
a production line. Usually, this proportion would not be known. So we suppose now that p 
is unknown. How can we estimate the unknown p? We certainly would observe the bino- 
mial process the production line represents; the result of this would be a number of good 
items from the process, say X, and we would surely use X in some way to estimate p. How 
precisely can we use X to estimate p? 
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It would appear natural to estimate p by the proportion of good items in the sample, * 


: : ; io 3 
Since X is a random variable, so is We can calculate the expected value of this random 
variable as follows: 


e(*| = DF PRs FD Pe 


This indicates that, on average, our estimate for p gives the true value, p. We say that our 
estimator, < is an unbiased estimator for p. 

This gives us a way of estimating p by a single value. This single value is dependent 
on the sample and if we choose another sample, we are likely to find another value of X, 
and hence arrive at another estimate of p. Could we also find a “likely” range for the value 
of p? 

To answer this, consider a related question. If we have a binomial situation with prob- 
ability p and sample size n, what is a “likely” range for the observed values of the random 
variable, X? The answer of course depends on the meaning of the word “likely.” Suppose 
that a likely range for the values of a random variable is a range in which the values of the 
variable occur with probability 0.95. 

With the considerable aid of our computer algebra system, we can consider a number 
of different binomial distributions. We vary n, the number of observations, and p, the prob- 
ability of success. In each case, we find the proportion of the values of X that lie within 
two standard deviations of the mean, that is, the proportion of the values of X that lie in the 
interval uy +20 =n-p+2/n-p-(1—p). We selected the constant 2 because we need to 
find a range that includes a large portion—95%—of the values of X and 2 would appear 
to be a reasonable multiplier for the standard deviation. Table 2.1 shows the results of these 


Table 2.1 Exact probability of binomial intervals around the mean for various val- 
ues of n and p 


n Pp Oo H+20 P 

36 1/2 3 12,24 0.971183 
64 1/2 4 24, 40 0.967234 
100 1/2 >) 40, 60 0.964800 
144 1/2 6 60, 84 0.963148 
196 1/2 7 84, 112 0.961530 
18 1/3 2 2,10 0.978800 
72 1/3 4 16, 32 0.967288 
162 1/3 6 42, 66 0.963177 
288 1/3 8 80, 112 0.961066 
48 1/4 3 6, 18 0.971345 
192 1/4 6 36, 60 0.963214 
432 1/4 9 90, 126 0.960373 
10000 1/2 50 4900, 5100 0.954494 
11250 1/3 50 3650, 3850 0.954497 
13872 1/4 51 3366, 3570 0.954499 
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calculations. Here, P represents the probability an observed value of the random variable 
X lies in the interval yp +20 =n-p+2,/n-p-(1—p). The values of n and p have been 
chosen so that the end points of the intervals are integers. 

We are led to believe from the table, regardless of the value of p, that at least 95% of the 
values of the variable X lie in the interval 4 + 20. (Later we will show, for large values of n, 
regardless of the value of p, that the probability is approximately 0.9545, a result supported 
by our calculations.) So we have 


Pu — 26 < X < pt 20) > 0.95. (2.5) 
Solving the inequalities for 4, we have 
P(X —20 <u <X +20) > 0.95. (2.6) 


Replacing y and o by n- p and ,/n- p- q, respectively, (2.6) becomes 


P(X—2,/n-p-q<n-p<X+2,/n-p-q) > 0.95. (2.7) 


The inequalities in (2.7) can now be solved for p. The result is 


p( meee? n2X +n? — nX2 eon 


<ps 
n?+4n n+4n 
= 0.95 (2.8) 


Our thinking here is as follows: if we find an interval that contains at least 95% of the values 
of X and if p is unknown, then those same values of X will produce an interval in which p, 
in some sense, is likely to lie. The end points produced by formula (2.8) comprise what we 
call a 95% confidence interval for p. 

While (2.5) gives a likely range of values of X if p is known, (2.8) gives a likely range 
of values of p if X is known. So we have a response to a variant of our first question: If X 
successes are observed in n binomial trials, what is a likely value for p? 

We note that (2.5) is a legitimate probability statement since X is a random variable and 
95% of its values lie in the stated interval. However, (2.8) is not a probability statement! 
Why not? The reason is that p is an unknown constant. Either it lies in the stated interval 
or it does not. Then, what does the 95% mean? 

Here is a way of looking at this. Consider samples of fixed size, say n = 100. If we find 
25 successes in these 100 trials (where X = 25), then (2.8) gives the interval 0.174152 < 
p < 0.345079. However, the next time we perform the experiment, we are most likely to 
find another value of X, and hence another confidence interval. For example, if X = 30, the 
confidence interval is 0.217492 < p < 0.397893. From (2.8), we see that these confidence 


2X 472 —nX2 
intervals are centered about the value poe and have width esa LS so both the cen- 


ter and width of the intervals change as X changes for a fixed sample size n. This gives us a 
proper interpretation of (2.8): 95% of these intervals will contain the unknown, and fixed, 
value p. 
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0.25 0.3 0.35 0.4 0.45 0.5 


Figure 2.11 Some confidence intervals. 


As another example, 15 observations were taken from a binomial distribution with 
n = 100 and gave the following values for X: 40, 44, 29, 43, 43, 42, 39, 40, 43, 42, 36, 44, 
35, 39, and 42. Formula (2.8) was then used to compute a confidence interval for p for each 
of these values of X. 

Figure 2.11 shows these confidence intervals. As expected, they vary in both position 
and width. The actual value of p used to generate the X values was 0.40. As it happens 
here, p = 0.40 is contained in 14 of the 15 confidence intervals, but in larger samples 
we would expect that 0.40 would be contained in about 95% of the confidence intervals 
produced. 


EXERCISES 2.6 


1. In the above-mentioned text, we drew 15 observations from a binomial distribution with 
n= 100. Calculate the end points of a 95% confidence interval for X = 40 as shown in 
Figure 2.11. 


2. If ten 95% confidence intervals for an unknown binomial p are calculated for samples 
of size 50, what is the probability that p is contained in exactly 6 of them? 


3. If a sample of size 30 is chosen from a binomial distribution with p = 1/2, and if X 
denotes the number of successes obtained, find an interval in which 95% of the values 
of X will lie. 

4. Use your computer algebra system to verify the results in Table 2.1 for 
(a) p= 1/2, n= 36 
(b) p= 1/3, n= 18 
(c) p= 1/4, n= 48 
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5. 


10. 


11. 


12. 


Use your computer algebra system to verify the result in Table 2.1 for 
(a) p = 1/2, n = 10000 
(b) p = 1/3, n= 11250 
(c) p= 1/4, n = 13872 


. A survey of 300 college students found that 50 are thinking about changing their 


majors. Find a 95% confidence interval for the true proportion of college students 
thinking about changing their majors. 


. A random sample of 1250 voters was asked whether or not they voted in favor of 


a school bond issue. Out of which, 325 replied that they favored the issue. Find a 
95% confidence interval for the true proportion of voters who favor the school bond 
issue. 


. Find 90% confidence intervals by constructing a table similar to Table 2.1. One should 


find that P(u — 1.6450 < X < w+ 1.6450) = 0.90. 


. A newspaper survey of 125 of its subscribers found that 40% of the respondents knew 


someone who was killed or injured by a drunk driver. Find a 90% confidence interval 
for the true proportion of people in the population who know someone who was killed 
or injured by a drunk driver. 


As a project in a probability course, a student discovered that among a random sample 
of 80 families, 25% did not have checking accounts. Use this information to construct 
a 90% confidence interval for the true proportion of families in the population who do 
not have checking accounts. 


A study showed that 1/8th of American workers worked in management or in admin- 
istration, while 1/27th of Japanese workers worked in management or administration. 
The study was based on 496 American workers and 810 Japanese workers. 

Is it possible that the same proportion of American and Japanese workers are in 
management or administration and that the apparent differences found by the study 
are simply due to the variation inherent in sampling? [Hint: Compare 90% confidence 
intervals. ] 

n values of X, the number of successes in a binomial process, are used to compute 1 
95% confidence intervals for the unknown parameter p. Find the probability that p lies 
in exactly k of the n confidence intervals. 


2.7 HYPOTHESIS TESTING: BINOMIAL 
RANDOM VARIABLES 


In the previous section, we considered confidence intervals for binomial random variables. 
The problem of estimating a parameter, in this case the value of p by means of an interval, is 
part of statistics or statistical inference. Statistical inference, in simplest terms, is concerned 
with drawing inferences from data that have been gathered by a sampling process. Statistical 
inference comprises the theory of estimation and that of hypothesis testing. In the preceding 
section, we considered the construction of a confidence interval that is part of the theory 
of estimation. The remaining portion of the theory of drawing inferences from samples is 
called hypothesis testing. We begin with a somewhat artificial example in order to fix ideas 
and define some vocabulary before proceeding to other applications. 
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Example 2.7.1 


The manufacturing process of a sensitive component has been producing items of which 
20% must be reworked before they can be used. A recent sample of 20 items shows 6 items 
that must be reworked. Has the manufacturing process changed so that 30% of the items 
must be reworked? 

Assume that the production process is binomial, with p, which is of course unknown to 
us, denoting the probability an item must be reworked. We begin with a hypothesis or con- 
jecture about the binomial process, which has not in fact changed, and that the proportion of 
items that must be reworked is 20%. We denote this by Hp and we call this the null hypoth- 
esis. As a result of a test — in this case the result of a sample of the items — this hypothesis 
will be accepted (i.e., we will believe that Ho is true) or it will be rejected (i.e., we will 
believe that Hp is not true). In the latter case, when the null hypothesis is rejected, we agree 
to accept an alternative hypothesis, H,. Here, the hypotheses are chosen as follows: 


H,: p = 0.30. 


How are sample results (in this case 6 items that must be reworked) to be interpreted? Does 
this information lead to the acceptance or the rejection of Hy)? We must then decide what 
sample results lead to the acceptance of Hy and what sample results lead to its rejection 
(and hence the acceptance of H,,). 

The sampling is, of course, subject to variability and our conclusions cannot be reached 
without running the risk of error. There are two risks: that we will reject Hg even though it 
is, in reality, true, or, we accept Ho even though it is, in reality, false. The following table 
may help in seeing the four possibilities that exist whenever a hypothesis is tested: 


Reality 
Hp True Ho False 
Ho Rejected Type I error (a) Correct decision 
Hy Accepted Correct decision Type II error (f) 


We never will know reality, but the table does indicate the consequences of the decision 
process. It is customary to denote the two types of errors by 
a = Probability of a Type I error 
= P|Hp is rejected when it is true] and 
PB = Probability of a Type II error 
= P[Hp is accepted when it is false]. 
Both a and # are conditional probabilities, and each is highly dependent on the set of sample 
values that lead to the rejection of the hypothesis. This set of values is called the critical 
region. 
What should the critical region be? We are free to choose any critical region we want; it 


would appear sensible in this case to conclude that the percentage of product to be reworked 
had increased when the number of items to be reworked in the sample is large. Therefore, we 
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arbitrarily take as a critical region {x|x > 9}, where X is the random variable denoting the 
number of items in the sample that must be reworked. 

What are the consequences of this choice for the critical region? We can calculate a, 
the size of the Type I error: 


a=P[X > 9 if Hp is true] 


[X >9ifp=0.2] 


20 
-> (?”) (0.2)*(0.8)20-* 
x=9 


= 0.00998179 = 0.01. 


So about 1% of the time, this critical region will reject a true hypothesis. This means 
that the manufacturing process is such that if p = 0.20, about 1% of the time it will behave 
as if p = 0.30 with this critical region. a is called the size or the significance level of the 
test. 

What is #? 


Pp = Placcept Hp if itis false] 
= P[X < 9 if Ho is false] 


= PIX < 9 if p = 0.30] 


8 
= » c) (0.30)*(0.70)°0-* 
x=0 . 


= 0.886669. 


These calculations are shown in Appendix A. 

So, with this critical region, about 89% of the time a process producing 30% items to 
be reworked behaves as if it were producing only 20% of such items. This might appear to 
be a very high risk. Can it be reduced? One way to reduce f would be to change the critical 
region to say {x|x > 8}. We now find that 


‘ 
a py (*”) (0.30)*(0.70)70-* = 0.772272 


Xx 


20 
but then a = ¥ (??) (0.20)*(0.80)°0-* = 0.032147. 


x=8 


So the cost in decreasing # comes at the cost of an increase in a. We will see later than 
one way to decrease both errors is to increase the sample size. 

What are the consequences of other choices for the critical region? We could choose 
x = 0 for the critical region so that the hypothesis is rejected only if x = 0. Then 


a= P[X = 0if p = 0.20] 
= (0.8) 
= 0.0115292, 
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producing a Type I error of about the same size as it was before. But then 


20 
p=>y (??) (0.30)"(0.70)20-* 


x=! 


= 0.999202. 


These two critical regions then have roughly equal Type I errors, but f is larger for the 
second choice of critical region. 

We will choose one more critical region whose Type I error is about 0.01: the critical 
region X = 9, 10, or 11. Then 


a = P[X = 9,10, or 11 if p = 0.20] 


11 
a= > i) (0.20)*(0.80)2° 


Xx 
x=9 


= 0.00998, 


again roughly 0.01. 
PB, however, is now 


11 
p=l- > (*’) (0.30)*(0.70)20-* 


Xx 
x=9 


= 0.891807. 


The earlier four cases illustrate that there are several choices for critical regions that 
give the same size for Type I error; we will call the critical region best, if for a given Type I 
error, it minimizes Type II error. In this case, the best critical region for a test with a = 0.01 
is {x|x > 9}. Best critical regions can often, but not always, be constructed. 

So, to return to the original problem where the sample yielded six items for reworking, 
we conclude that the process has not changed since it is not in the critical region {x|x > 9} 
for a = 0.01. 

Finally, we note that the size of Type I error, f, is a function of the alternative, p = 0.30, 
in this example. If the alternative hypothesis were H,: p > .20, then f could be calculated 
for any particular alternative in H,. That is, if p > 0.20, then 


8 
p= > (? ) p*(1 — p)*°, a function of p. 
x=0 \~ 


As p increases, 6 decreases quite rapidly, reflecting the fact that it is increasingly 
unlikely that the hypothesis will be accepted if it is false. A graph of f as a function of 
p is shown in Figure 2.12. 

It is customary to graph | — f = Pla false Ho is rejected]. This is called the power 
function for the test. 

The hypothesis Hp) : p= 0.20 is called a simple hypothesis since it completely 
specifies the probability distribution of the variable under consideration. The hypothesis 
H,: p > 0.20 is composed of an infinity of simple hypotheses. It is called a composite 
hypothesis. 
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0.8 


0 0.2 0.4 0.6 0.8 1 
p 
Figure 2.12 as a function of p for Example 2.7.1. 


Example 2.7.2 


In the previous example, the critical region was specified and then values for a and f were 
found. It is common, however, for experimenters to specify a and f before the experiment 
is done; often the sample size necessary to achieve these probabilities can be found, at least 
approximately. One of the consequences of the binomial model in the preceding example 
is that a change in the critical region by a single unit produces large changes in @ and f. 
Suppose, in the preceding example, that it is desired to have, approximately, a = 0.05 and 
Bp = 0.10. If we assume that the best critical region is of the form {x|x > k}, then 


n 


a= > (") (0.20)*(0.80)"-* = 0.05 


x=k 


and 
k-1 
ja), (") (0.30)*(0.70)"-* = 0.10, 
x=0 


These equations are difficult to solve without the aid of extensive binomial tables or a 
computer algebra system. We find that 


156 


156 ae 
a= >| . ) co20y0.80)"* * = 0.05145 


39 
and p= 5) ce (0.30)"(0.70)!55-" = 0.09962, 
x=0 


son & 156 and k = 40. These values are probably close enough for all practical purposes. 
Other solutions are of course possible, depending on the closeness with which we want to 
solve the equations for @ and f. It may well be that we cannot carry out an experiment with 
this large sample size; such a restriction would obviously then have implications for the 
sizes of a and f that can be entertained. 
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EXERCISES 2.7 


1. A manufacturer of a new electronic tablet device wants to determine the proportion 
of current tablet users who would purchase a new version of the tablet. The manufac- 
turer thinks that 15% of current users would purchase the new tablet. For a test, the 
hypotheses are as follows: 


H,: p > 0.15. 


(a) Find a if the critical region is X > 30 for a sample of 150 tablet users. 
(b) Find # for H,: p = 0.25. 

2. A new car dealer tests customers who will pay $1000 down for free financing for 2 
years. A sample of 20 buyers is taken; X is the number of customers who will take the 
financing deal. The hypotheses are as follows: 


Hy: p = 0.40 
H,: p > 0.40. 


(a) Find a if the critical region is X < 8. 
(b) Find f for the alternative p = 0.50. 

3. It is thought that 80% of VCR owners do not know how to program their VCR for 
taping a TV program. To test this hypothesis, a sample of 20 VCR owners is chosen 
and the proportion, p, who can program a VCR is recorded. The hypotheses are as 
follows: 


Hp: p = 0.80 
H,: p < 0.80. 


(a) Find a if the critical region is X < 14 where X is the number in the sample who 
cannot program a VCR. 


(b) Find f for the alternative H,: p = 0.70. 
(c) Graph f as a function of p, 0 < p < 0.80. 


4. A researcher speculates that 20% of the people in a very large group under study is 
left-handed, a proportion much larger than the 10% of people who are left-handed in 
the population. A sample is chosen to test 


Ho: p = 0.10 
H,: p = 0.20. 
The critical region is X > k, where X is the number of left-handed people in the sample. 


It is desired to have a = 0.07 and # = 0.13, approximately. How large a sample should 
be chosen? 


5. In exercise 4, show that f is larger for the critical region X < c where c is chosen so 
that the test has size a. 
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6. A drug is thought to cure 2/3 of the patients with a disease; without the drug, 1/3 of the 
patients recover. The hypothesis 


Hy: p= 1/3 is tested against 
H,: p = 2/3 


on the basis of a sample of 12 patients. Hy is rejected if X, the number of patients in 
the sample who recover, is greater than 5. Find a and f for this test. 


7. In exercise 6, find the sample size for which a = 0.05 and f = 0.13, approximately. 


8. A recent survey showed that 46% of Americans feel that they are “being left behind 
by technology.” To test this hypothesis, a sample of 36 Americans showed that 18 of 
them agreed that they were being left behind by technology. Does the data support the 
hypothesis Hy: p = 0.46 against the alternative H,: p > 0.46? (use a = 0.05.) 


9. A publisher thinks that 57% of the magazines on newsstands are unsold. To test this 
hypothesis, a sample of 1000 magazines put on the newsstand resulted in 495 unsold 
magazines. Does this data support Hp: p = 0.57 or the alternative H,: p < 0.57 ifa = 
0.05? 


10. A survey indicates that 41% of the people interviewed think that holders of Ph.D. 
degrees have attended medical school. In a sample of 88 people, 50 agreed that Ph.D.’s 
attended medical school. Is this evidence, using a = 0.05, that the percentage of people 
thinking that Ph.D.’s are M.D.’s is greater than 41%? 


11. Ina survey of questions concerning health issues, 59% of the respondents thought that 
at some time in their life they would develop cancer. If a sample of 200 people showed 
that 89 agreed that they would develop cancer at some time, is this evidence to support 
the hypothesis that the percentage thinking they will develop cancer is less than 59% 
(use a = 0.05). 


12. Among Americans earning more than $50,000 per year, 2/3 people agree that Ameri- 
cans are “materialistic.” If 70 people out of 100 people interviewed agree that Ameri- 
cans are materialistic, is this evidence that the true proportion thinking Americans are 
materialistic is greater than 2/3 (use a = 0.05). 


2.8 DISTRIBUTION OF A SAMPLE PROPORTION 


Before considering some important probability distributions in addition to the binomial 
distribution, we consider here a common problem: a sample survey of n individuals indi- 
cates that the proportion p, of the respondents favors a certain candidate in an election. p, 
is clearly a random variable since our sampling will not always produce exactly the same 
proportion of voters favoring the candidate if the sampling is repeated, p, is called a sample 
proportion. What is its probability distribution? How can we expect p, to vary from sample 
to sample? If we observe a value of p, — say 51% of the voters favor a candidate — what 
does this tell us about the true proportion of all voters who favor the candidate, say p? We 
consider these questions now. 

Let us suppose that in reality the proportion p of the voters favor a candidate. Let us 
also assume that the sample is taken so that the responses can be assumed to be independent 
among the people interviewed. The number of voters favoring the candidate, say X, is then 
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a binomial random variable since a voter favors either the candidate or the opponent. The 
sample proportion favoring the candidate, P,, is also a random variable. If we take a random 
sample of size n, then 


So our random variable P, is related to a binomial random variable. We considered 
confidence intervals for binomial random variables in Section 2.6. We now extend that 
theory somewhat. 

We now calculate the mean and variance of the variable P,. We let the sample propor- 
tion be p, = = Clearly, 


P(P, = ps) = p(= =p.) = P(X =n-p,) = P(X =x) 


so 


1 
E(P;) = YP . P(P, = Ps) 


Ps=0 


aS -p(= = p,) 
Ain n 
1 n 
— ~PX= 
nee (X =) 
Therefore, E(P,) = ul - E(X) = ae p. 
n n 


So, as might be expected, the average value of the variable P, is the true proportion, p. 
This is precisely the same result we saw in Section 2.6. 

The variance of P, can be calculated using the variance of a binomial random variable 
as follows: 


1 
Var(P,) = Var (=) = )\P, =p? PR, =p) 
Ps=0 


n 4 2 xX 
3 (5-0) 2(E=r,) 
<0 nN nN 
1 n 
= 5 rn py PK = np) 
x=0 
1 n 
= 5 en Py - P(X =x) 
x=0 


showing that 


Var(P,) = = - Var(X) 


5 
n2 


or that 
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Var(P,) = 4 -n-p-q= —— 
Pe) 


The earlier considerations also show a more general result: if random variables X 
and Y are related by Y =k - X, where k is a constant, then E(Y) = k - E(X) and Var(Y) = 
k? - Var(X). 

Using the facts we derived in Section 2.6 regarding binomial confidence intervals, we 


can say that 
(,-2- fs <oso42/E4) > 0.95 
n n 


giving a 95% confidence interval for the true population proportion, p. But, as occurred in 
the binomial situation, the standard deviation is a function of the unknown p, so we must 
solve for p. There are two ways to do this. One method is to solve the quadratic equations 
that arise exactly. However, if 0.3 < p < 0.7, then a good approximation to p - q is * | This 
approximation is far from exact, but often yields acceptable results when p is in the indicated 
range. 


Example 2.8.1 


A sample survey of 400 voters showed that 51% of the voters favored a certain candidate. 
Find a 95% confidence interval for p, the true proportion of voters in the population favoring 
the candidate. 

We have that 


Pq Pq 
Pl 0S1=29-4/7—2 29 205149+ 4/2 \s095. 
( TT fee eam rat 2 


If we solve the inequalities for p, noting that g = | — p, we find that 


n-p,+2—24/l+n-p,—n- pr n-p,+24+24/l+n-p,—n- pr 


<p< 
n+4 = n+4 


This result is equivalent to formula (2.8) in Section 2.6. 

Substituting n = 400 and p, = 0.51 gives P(0.46016 < p < 0.55964) > 0.95 while 
using the approximation p - gq = 1/4 gives P(0.46 < p < 0.56) > 0.95. 

The difference in the confidence intervals is very small, but this is because the observed 
proportion, 0.51, is close to 1/2. The two confidence intervals will deviate more markedly as 
the difference between p, and 1/2 increases. The candidate certainly cannot feel confident 
of winning the election on the basis of the sample, but we can only make this obser- 
vation since we have created a confidence interval for p. In the popular press, half the 
width of the confidence interval is referred to as the sampling error. So a survey may 
be reported with a sampling error of 3% meaning that a 95% confidence interval for p is 
Ps + 0.03. 

If the sampling error is given, then the sample size can be inferred. If the sampling 
error is stated as 3%, then 
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3.4/7 1 =0.03 
n 


where, of course, the difficulty is that p is unknown. Note that p- q ~ 1/4 if 0.3 < p < 0.7. 


Using this approximation here, we conclude that vi ~ 0.03 so thatn = 1111. 

The approximation p - q = 1/4 is usually used only if p is in the interval 0.3 < p < 0.7; 
otherwise p is replaced by the sample proportion, p,, in determining sample size. 

We presumed earlier here that the sample of voters is a simple random one, and we 
further presumed that the people sampled will actually vote and that they have been candid 
with the interviewer concerning their voting preference. Samplers commonly call these 
presumptions into question and have a variety of ways of dealing with them. In addition, 
such samples are rarely simple random samples; all we can say here is that these variations 
in the sampling design have some effect on the sampling error. 


EXERCISES 2.8 


1. A random sample of 200 automobile registrations shows that 22 are Subarus. Find a 
95% confidence interval for the true proportion of Subaru registrations. 


2. Compare the result in exercise | by estimating p = 1/4. 


3. A survey of 300 paperback novels showed that 47% could be classified as romance nov- 
els. Find an approximate 95% confidence interval for p, the true proportion of romance 
paperback novels. 


4. Records indicate that 1/8 of American children receive welfare payments. If this survey 
was based on 250 records, find an approximate 95% confidence interval for the true 
proportion of children who receive welfare payments. 


5. A random sample of 300 voters showed that 48% favored a candidate. Does an approx- 
imate 95% confidence interval indicate that it is possible for the candidate to win the 
election? 


6. A survey of 423 workers found that 1/9 were union members. Find an approximate 
95% confidence interval for the true proportion of union workers. 


7. The sampling error of a survey in a magazine was stated to be 5%. What was the sample 
size for the survey? 


8. A student conducted a project for a statistics course and found that 2/3 of the respon- 
dents in interviews of 120 people did not know that the Bill of Rights is the first ten 
amendments to the Constitution. Find an approximate 90% confidence interval for the 
true proportion of people who do not know that the Bill of Rights is the first ten amend- 
ments to the Constitution. 


9. A magazine devoted to health issues discovered that 3/5 of the time a visit to a physician 
resulted in a prescription. The survey was based on 130 telephone interviews. Use this 
data to construct an approximate 90% confidence interval for the true proportion of 
patients, given a prescription as a result of a visit to their physician. 

10. According to a recent study, 81% of college students say that they favor drug testing in 
the workplace. The study was conducted among 400 college students. Find an approx- 
imate 90% confidence interval for the true proportion of college students who favor 
drug testing in the workplace. 
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11. Interviews of 150 patients recently tested for the HIV virus indicate that among those 
whose tests indicate the presence of the virus, 1/2 did not know they had the virus prior 
to testing. Find an approximate 95% confidence interval for the proportion of people in 
the population whose tests indicate they have the HIV virus and who did not know this. 


12. A California automobile dealer knows that 1/10 of California residents own convert- 
ibles. Is the dealer likely (with probability 0.95) to sell at least 200 convertibles in the 
next 1000 sales? 


2.9 GEOMETRIC AND NEGATIVE 
BINOMIAL DISTRIBUTIONS 


We considered geometric random variables in Examples 2.3.5 and 2.3.8 where the random 
variable of interest was the waiting time for the occurrence of a binomial event. A perfect 
model for the geometric random variable is tossing a coin, loaded so that the probability of 
coming up heads is p, until heads appear. If X denotes the number of tosses necessary and 
if g = 1 — p, we have seen that 


P(X =x) =g¢! “p;. #1, 2335 ca0 


and that 1 
B(X) = = and Var(x) = 4. 
P P 


Now suppose we wait until the second head appears when the loaded coin is tossed. 
Let X denote the number of trials necessary for this event to occur. We want P(X = x), the 
probability distribution for X. Since the last trial must be heads, the first x — 1 trials must 
contain exactly one head and x — 2 tails; since the trials are independent, and since the 
single head can occur in any of x — | places, it follows that 


P(first x — 1 trials have exactly 1 heads and x — 2 tails) = +. ') gp 


So, since the last trial must be heads, 
x=] x-2 
PX =x)= 1 ‘Gg “-p-p, x=2,3,4, ... (2.9) 


Since formula (2.9) exhausts the possibilities, it must be that heer 6: =%)= 1, 
One way to verify this is to notice that 


S (x-1 x-2 2 + x-1 —2 _ 2 2.2 p22 
> ( I )-a "P =P ("] ss Ak al a er 


x=2 


by the binomial theorem with a negative exponent. This series will arise again in our work. 
We have established the probability distribution for the waiting time for the second head. 
What is the average waiting time for the second head? We might reason as follows: we 
flip the coin until the first head appears; the average number of flips is 1/p. But then the 
situation is exactly the same as it was for the first flip of the coin; the fact that we flipped 
the coin and waited for the first head has absolutely no influence on subsequent tosses of 
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the coin. We must wait an average of 1/p flips again until the second head appears. So the 
average waiting time for the second head to appear is ; +is =. It follows that if we were 
to wait for the rth head to appear, the average total waiting time would be r/p. We will give 
a more formal derivation of this result later. 

What is the probability distribution function for the rth head to appear? Let X denote 
the number of tosses until the rth head appears. Since, again, the last toss must be heads 
and the first x — | tosses must contain exactly r — | heads: 


P(X =x)= 7 pag sp, xarnr+ lyre d, (2.10) 


Since P(X = x) > 0, we must check the sum of the probabilities to see that we have a 
probability distribution function. 


= x= 1 r —r r — x—1 x-—r r —r 
put (77 })-7 tg =P (Rh) 4 =pil=q =. 


x=r 


so P(X = x) is a probability distribution. 

If r = 1 in (2.10), we find that P(X = x) reduces to the geometric probability distribu- 
tion function. The result in (2.10) is called the negative binomial distribution because of 
the occurrence of the binomial expansion with a negative exponent. 

We now calculate the mean and the variance of this negative binomial random variable. 
We reasoned that the mean is “ and we now give another derivation of this. 


P 
By the definition of expected value, 


ll 
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> 
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Now we seek the variance of this negative binomial random variable. Since E(X7) is 
difficult to find directly, we resort to the fact that 


Var(X) = E[X(X + 1] — E(X) — [EQO/. 
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Now 


EIX(X + D] = Dae + 1) ') peg 


x=r 


_ +1 x—r 
=r nD (3th) 4 


nee Nop -O=g 
r+ 1) 
aa 


Since E(X) = r/p, it follows that 


2 
von a AD 2 (2 re 
P 


2 


It is also useful to view the above-mentioned random variable X as a sum of other 
random variables. Let X, denote the number of trials up to and including the first success, 
X, denote the number of trials after the first success until the second success, and so on. It 
follows that 

X=X,+X,+... +X, 


re 


Each of the X,’s has mean 1 /p and variance q/p. We see that 


r 3 : 
E(X)=-=E X,)= E(X;) and in this case 
p-#(Bx)=E 


i=1 


Var(X) = = = Var ( Dx) = J Var(X;), 
i=l i=1 


verifying results that were previously obtained. 

The fact that the expectation of a sum is the sum of the expectations is generally true; 
the fact that the variance of a sum is the sum of the variances requires independence of the 
summands. We will discuss these facts in a more thorough manner in Chapter 5. 

In Figure 2.13, we show a graph of the negative binomial distribution with r = 5 and 
p = 1/2. It shows that the probabilities increase to a maximum and then decline to become 
asymptotic to the x-axis as (2.10) would lead us to suspect. 

It is also interesting to consider the total number of failures that precede the last success. 
If Y denotes the number of failures preceding the rth success, then 


ytr-1 


py == ( ; 


) -p'-@, y=0,1,2,..., 
which is also a negative binomial distribution. Here, E(Y) = F(X —r) = E(X)-r= 


r req qd 
--—r= — and Var(Y)= 5. 
pp wa p 


We now consider three fairly complex examples involving the negative binomial dis- 
tribution. Each involves special techniques. 
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Probability 


5 7 9 11 13 15 17 #19 21 23. 25 
xX 


Figure 2.13 A negative binomial distribution. 


Example 2.9.1 All Heads 


I have some fair coins. I toss them once, together, and set aside any that come up heads. 
I continue to toss the coins remaining, on each toss removing those that come up heads, 
until all of the coins have come up heads. On average, how many (group) tosses will I have 
to make? 

The problem is probably a bit hard at this point, so let us analyze the situation with 
only two fair coins. 

Since the waiting time for heads with either coin is a geometric variable, we are inter- 
ested in the maximum value of two geometric variables. Let Y be the random variable 
denoting the number of group tosses that must be made. We seek P(Y = y). The last head 
can occur at the yth toss in two mutually exclusive ways: 


1. Both coins come up tails for y — 1 tosses and then both come up heads on the yth toss 
or 


2. Exactly one of the coins comes up heads on one of the first y — | tosses, followed 
by a head on the remaining coin on the yth toss. 


y-1 
The first of these possibilities has probability G . ; . To calculate the second, 


suppose first that there are j — 1 tosses where both coins show tails. Then one of the coins 
comes up heads on the jth toss. Finally, the single remaining coin is tossed giving y —j — | 
tails followed by heads on the yth toss. 

This sequence of events has probability 


a {Q44-@ 4 


To find the probability for the second possibility, we must sum the earlier expression 
over all possible values of j. Thus, the second possibility has probability 


¥ (7)-G)-()” 


J 


www.it-ebooks.info 


106 Chapter2 Discrete Random Variables and Probability Distributions 


So, putting these results together, 


This reduces to 


ae 
y 


PY =y)= i= 1,253, ce. 
A computer algebra system shows that the mean, and also the variance, of this distri- 
bution is 8/3. 


Example 2.9.2 


A fair coin is tossed repeatedly, and a running count of the number of heads and tails 
obtained is made. What is the probability the heads count reaches 5 before the tails count 
reaches 3? 

Clearly the last toss must result in the fifth head that can be preceded by exactly 0, or 
1 or 2 tails. Each of these probabilities is a negative binomial probability. 

Let X denote the total number of tosses necessary and let j denote the number of tails. 
Then, by the negative binomial distribution, 


P(5 heads before 3 tails) = 2, & y ys 
j=0 
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It may be easier to see the structure of the answer if the coin was loaded. Let p denote 
the probability of heads. Then, reasoning as above, 


2 
P(5 heads before 3 tails) =), (‘ i) ed 
j=0 
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and 


Pcheads count reaches / before the tails count reaches f) 


Example 2.9.3 Candy Jars 


A professor has two jars of candy on his desk. When a student enters his office, he or she 
is invited to choose a jar at random and then select a piece of candy. After sometime, one 
of the jars will be found empty. At that time, on average, how many pieces of candy are in 
the remaining jar? 

The problem appears in the literature as Banach’s Match Book Problem after the 
famous Polish mathematician. It is an instance of Example 2.9.2. 

We specialize the problem to two jars, each jar initially containing n pieces of candy 
and we further suppose that each jar is selected with probability 1/2. 

Consider either of the jars; call it, for convenience, the first jar; suppose we empty it and 
then at some subsequent selection, choose it again and find that it is empty. Suppose further 
that the remaining jar at that point has X pieces of candy in it. Thus, the first n + (n — x) 
selections involve choosing the first jar exactly n times and the last choice must be the first 
jar. Since the jars are symmetric and it makes no difference, which we designate as the 
first jar, 


as 2n-¥+1 
P=) =2-(7" *)-(3) ,x=0,1,2,... 57. (2.11) 


n 


A graph of this probability distribution function, for n = 15, is shown in Figure 2.14. 
It shows that the most probable value for X is x = 0 or x = I, and that the probabilities 
decrease steadily as x increases. 


Probability 


Figure 2.14 The candy jars problem for n= 15. 
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From the arguments used to establish (2.11), it follows that 


ls = 2n-x+1 
eG) 
x=0 a 2 


A direct analytic proof of this is challenging. Finding the mean and variance is similarly 
difficult, so we show a way to find these using a recursion. (This method was also used to 
establish the mean and variance of the binomial distribution and is generally applicable to 
other discrete distributions.) 


A Recursion 
It is easy to use (2.11) to show that 


PX=x) _ n-x+t+l1l 


2 sox = 12,00. (2.12) 
PX =x-1l 2n-—x+1 


This can also be written as 


P(X = x) x= 1 
=l|- ce, Od 15.25 sce5 Is 
P(X =x-1) 2n —(x- 1) 


showing that the probabilities decrease as x increases and that the most probable value is 
x=Oorx=1. 

Now we seek the mean and the variance. Rearranging and summing (2.12) from | to 
n (the region of validity for the recursion), we have 


YQn-x+ 1)- P(X =x) =2-Sam-x+ 1)- P(X =x- 1). 
x=1 x=1 
This in turn can be written as 
(2n+ 1)-[1-— P(X = 0)] -— E(X) = 2n- [1 - P(X =n)] 
—2-[E(X)-n- P(X =n)]. 


Simplifying and rearranging give 


E(X) = (2n+ 1): @) (3) = 


E(X) is approximately a linear function of n as Figure 2.15 shows. 
To find the variance of X, we first find E(X*). It follows from recursion (2.12) that 


n n 


Dix Qn-x+ 1): PX =x) =2)'x-(n-x+ 1)- P(X =x-1). 


x=1 x=1 
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1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 
x 
Figure 2.15 E(X) for the candy jars problem. 


The left-hand side reduces to (2n + 1) - E(X) — E(X?) while the right-hand side can be 
written as 


2n- Pe 1) P(X =x- 1) +2n- PPK =x-1) 


x=1 x=1 


—2)x-(x-1)-PX=x- 1), 


n 


x=1 


which becomes 
2n- [E(X) —n- P(X =n)] + 2n- [1 — P(X = n)] — 2E[X(X + 1)] 
+ 2n(n + 1)P(X =n). 


It then follows that 
2n 2n]2 
Var(X) = 2(n + 1) —(2n + 1)- i (5) = Jon Pa he (7") (5) 


This is an increasing function of n. A graph is shown in Figure 2.16. 


12 


1 3 5 7 9 11°13 15 17 #19 21 
xX 


Figure 2.16 Variance in the candy jars problem. 
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EXERCISES 2.9 


1. A fair die is thrown until a 6 appears. Find the probability this occurs in 5 tosses. 


2. [have 6 pairs of socks randomly distributed in a drawer. They are drawn out one at a 


10. 


11. 


12. 


time until a pair occurs. Find the probability this happens in 3 draws. (The reader may 
also wish to consult Chapter 7.) 


. Acoin, loaded to come up heads 2/3 of the time, is thrown until heads appear. What is 


the probability an odd number of tosses is necessary? 


. The coin in problem 3 is now tossed until the fifth head appears. What is the probability 


this will occur in at most 9 tosses? 


. The probability of a successful rocket launching is 0.8, the process following the bino- 


mial assumptions. 
(a) Find the probability the first successful launch occurs at the fourth attempt. 


(b) Suppose now that attempts are made until 3 successful launchings have occurred. 
What is the probability that exactly 6 attempts will be necessary? 


. A box of manufactured parts contains four good and three defective parts. They are 


drawn out one at a time, without replacement. Let X denote the number of the drawing 
on which the first defective part occurs. 

(a) Find the probability distribution for X. 

(b) Find F(X). 


. The probability a player wins a game at a single trial is 1/3. Assume the trials follow the 


binomial assumptions. If the player plays until he wins, find the probability the number 
of trials is divisible by 4. 


. The probability a new driver will pass a driving test is 0.8. 


(a) One student takes the test until she passes it. What is the probability it will take at 
least two attempts to pass the test? 

(b) Now suppose three students take the driving test until each has passed it. What is 
the probability that exactly one of the three will take at least two attempts before 
passing the test? (Assume independence.) 


. To become an actuary, one must pass a series of 9 examinations. Suppose that 60% of 


those taking each examination pass it and that passing the examinations are independent 
of each other. What is the probability a person passes the 9th examination, and so has 
passed all the examinations, on the 15th attempt? 


A quality control inspector on a production line samples items until a defective item is 

found. 

(a) If the probability an item is defective is 0.08, what is the probability that at least 
10 items must be inspected? 

(b) Suppose now that the 16th item inspected is the first defective item found. If p is the 
probability an item is defective, what is the value of p that makes the probability 
that the 16th item inspected is the first defective item found most likely? 

A fair coin is tossed. What is the probability the fourth head is preceded by at most two 

tails? 

A TV interviewer must conduct five interviews. Suppose the probability a person agrees 

to be interviewed is 2/3. 
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(a) What is the probability the interviewer will ask 9 people in all to be interviewed? 
(b) How many people can the interviewer expect to ask to be interviewed? 


13. In August, the probability a thunderstorm will occur on any particular day is 0.1. What 
is the probability the first thunderstorm in August will occur on August 12? 


14. In a manufacturing process, the probability a produced item is good is 0.97. Assuming 
the items produced are independent, what is the probability that exactly five defective 
items precede the 100th good item? 


15. A box contains six good and four defective items. Items are drawn out one at a time, 
without replacement. 


(a) Find the probability the third defective item occurs on the fifth draw. 
(b) On what drawing is it most likely for the third defective to occur? 


16. A coin, loaded to come up heads with probability 3/4, is tossed until heads appear or 
until it has been tossed five times. Find the probability the experiment will end in an 
odd numbered toss, given that the experiment takes more than one toss. 


17. Suppose you are allowed to flip a fair coin until the first head appears. Let X denote the 
total number of flips required. 

(a) Suppose you win $ 2* if X < 19 and $2” if X > 20 for playing the game. A game 
is fair if the amount paid to play the game equals the expected winnings. How 
much should you pay to play this game if it is fair? 

(b) Suppose now that you win $2* regardless of the number of flips. Can the game be 
made fair? 


18. Use the recursion (2.12) to find the most likely number of pieces of candy remaining 
when one of the candy jars is found empty. 


19. X is a negative binomial random variable with p as the probability of success at any 
trial. Suppose the rth success occurs at trial f. Find the value of p that makes this event 
most likely. 


2.10 THE HYPERGEOMETRIC RANDOM VARIABLE: 
ACCEPTANCE SAMPLING 


Acceptance Sampling 


Products produced from industrial processes are often subjected to sampling inspection 
before they are delivered to the customer. This sampling is done to insure a level of quality 
in delivered manufactured products and to insure some uniformity in the product. Usu- 
ally, unacceptable product (product which does not meet the manufacturer’s specifications) 
becomes mixed up with acceptable product due to changes in the manufacturing process 
and random events in that process. Modern techniques of statistical process control have 
greatly improved the quality of manufactured products and while it is best to produce only 
flawless products, often the quality of the product can only be determined through sam- 
pling. However, determining whether a product is acceptable or unacceptable may destroy 
the product. Because of the time and money involved in inspecting the product in its entirety 
even if destruction of the product is not involved, sampling plans, which inspect only a sam- 
ple of the product, are often employed. It has also been found that sampling is often more 
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accurate than 100% inspection since the inspection of each and every item demands con- 
stant attention. Boredom or lack of care often sets in which is not the case when smaller 
samples are randomly chosen at random times. As we will see, probability theory renders 
100% inspection unnecessary even when it is possible, so total inspection of a manufactured 
product has become rare. 

Due to the emphasis on quality in manufacturing and statistical process control, prob- 
ability theory has become extremely important in industry. 

The chance a sample has a given composition can be determined from probability the- 
ory. As an example, suppose we have a lot (a number of produced items) containing eight 
acceptable, or good, items as well as four unacceptable items. A sample of three items is 
drawn. What is the probability the sample contains exactly one unacceptable item? 

The sampling is done without replacement (since one would not want to inspect the 
same item repeatedly!), and since the order in which the items are drawn is of no importance, 
there are (2) = 220 samples comprising the sample space. If the sampling plan, that is, the 
manner in which the sampled items are drawn, is appropriate, we consider each of these 
samples to be equally likely. Now, we must count the number of samples containing exactly 
one unacceptable item (and so exactly two acceptable items). There are (4) . (8) = 112 such 


1 
samples. So the probability the sample contains exactly one unacceptable item is 


so the probability that the sampling plan will detect at east one unacceptable item is 


() 

3 

1 —- ——— =0.745. 
12 
(5) 

Our sampling plan is then likely to detect at least one of the unacceptable items in the 
lot, but it is not certain to do so. 

Let us suppose that we carry out the earlier inspection plan and decide to sell the entire 
lot only if no unacceptable items are found in the sample. The probability this lot survives 
this sampling plan and is sold is 0.255. So about 26% of the time, lots with 4/12 = 331/3% 
unacceptable items will be sold. 


Usually, then the sampling plan will determine some unacceptable items, which are 
not sent to the customer. One of two courses of action is generally pursued at this point. 
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Either the unacceptable items in the sample are replaced with good items or the entire lot 
is inspected and any unacceptable items in the lot are replaced by good items. Either of 
these plans will improve the quality of the product sold, the second being the better if it can 
be carried out. In case the testing is destructive, only the first plan can be executed. Let us 
compare the plans in this case, assuming that either can be carried out. 

We start by replacing only the unacceptable items in the sample. 


(3) 


The sample contains no unacceptable items with probability 712) = 557 8° the outgo- 


(3)-() _ 28 
(2) 55’ 
(i)-G) _ 12 
(2) 55’ 


4 
Finally, the sample contains exactly three unacceptable items with probability s = 
3 
= resulting in 1/12 unacceptable items in the outgoing lot. 


The result of this plan is that, on average, the percentage of unacceptable items the lot 
will contain is 


3 
ing lot will contain 4/12 or 1/3 unacceptable items with this probability. 


The sample contains exactly one unacceptable item with probability 


producing 3/12 or 1/4 unacceptable items in the outgoing lot. 


The sample contains exactly two unacceptable items with probability 


producing 2/12 or 1/6 unacceptable items in the outgoing lot. 


14 1,28 I. 12 «1 1 1 
Be ee is Oe a Se ESO. 
55 37 55 4° 35 6 35 12 nee 


This is considerably less than the 33 1/3% unacceptable items in the lot. Sampling 
cannot improve the quality of the product manufactured, but it can, and does, improve the 
quality of the product sold. In fact, dramatic gains can be made by this process, which we 
will call acceptance sampling. 

Even greater gains can be attained if, when the sample contains at least one unaccept- 
able item, the entire lot is inspected and any unacceptable items in the lot are replaced by 
good items. In that circumstance, either the lot sold is 100% good (with probability 0.745) 
or the lot contains 4/12 = 33 1/3% unacceptable items. Then the average percentage of 
unacceptable items sold is 


0% - 0.745 + 331/3% - 0.255 = 8.5%. 


This is a dramatic gain and, as we shall see, is often possible if acceptance sampling is 
employed. 

The average percentage of unacceptable product sold is called the average outgoing 
quality (AOQ). The AOQ, if only unacceptable items in the sample are replaced before the 
lot is sold, is 25%. 

Lots are rarely so small as in our example, so we must investigate the behavior of the 
above-mentioned sampling plan when the lots are large. Before doing that, we define the 
relevant random variable and determine some of its properties. 


www.it-ebooks.info 


114 Chapter2 Discrete Random Variables and Probability Distributions 
The Hypergeometric Random Variable 


We generalize the earlier situation to a lot of N items, D of which are unacceptable. Let X 
denote the number of unacceptable items in the randomly chosen sample of n items. Then 


(2) Gee) 


P(X=x= a Cane x= 0, 1,2,...,Min{n, D} (2.13) 
(7) 


We assume that Min{n,D}=n in what follows. The argument is similar if 
Min{n, D} = D. 

If X has the probability distribution given by (2.13), then X is called a hypergeometric 
random variable. 

Since )"_o es ) . es =. ) represents all the mutually exclusive ways in which x unac- 
ceptable items and n — x acceptable items can be chosen from a group of N items, this sum 
must be iad i; showing that the sum of the probabilities in (2.13) must be 1. 

We will use a recursion to find the mean and variance. Let G=N-—D. Then, 


from (2.13), 

_PR =x) = (Pee Wey): x=1,2,...,n. (2.14) 
P(X =x-—1) G—n+ x) 
So 


(G- n) Yi xP(X =x)+ xe =x) 
x=1 x=1 


= iwo-x+ Din-x+1)P(X =x- 1). 


x=1 
After expanding and simplifying the sums involved, we find that 


D 
E(X)=n- nN 
This result is analogous to the mean of the binomial: np, but here D/N is the probability 
that the first item drawn only is unacceptable. It is surprising that the nonreplacement does 
not affect the mean value. The drawings for the hypergeometric are clearly dependent, a 
fact that will affect the variance. 
To find E(X?), multiply (2.14) through by x giving 


(G- n) Ye P(X =x)+ yr Pre = x) 
x=1 x=] 


= Dix (D-x+ Din— x4 IP = x- 1). 


x=1 
These quantities can be expanded and simplified using the result for E(X). We find that 


E(X?) = No) -(nD-n-—D+N), 


www.it-ebooks.info 


2.10 The Hypergeometric Random Variable: Acceptance Sampling 115 


from which it follows that 


D N-D N-n 
Vaan eS oe 
ae aa a ee 


This result is analogous to the variance, n - p - q, of the binomial but involves a factor, 
, often called a finite population correction factor, due to the fact that the drawings are 
not independent. 

The correction factor, however, approaches | as N —> oo and so the variance of the 
hypergeometric approaches that of the binomial. This result, together with the mean value, 
suggests that the hypergeometric distribution can be approximated by the binomial distri- 
bution as the population size, NV, increases. This is due to the fact that as N increases, the 
nonreplacement of the items drawn has less and less effect on the probabilities involved. 
We pause here to show that is indeed the case. 

We begin with 


N-n 


which can be written as 


_ D(D-1(D-2)-+-D-xF+ 


P(X = x) a 


(N= DIN =D Te W= Denar 1) 
(n—x)! 


n! 


NIN - 1)(N-2)++:(N-n4+1) 


This in turn can be rearranged as 


—-D-1 N-D-n+x4+1 
== N-n+1 


D D-1— D=x+! 
N? N=17""? N=x+1 
e 


Approximating each of the factors by Q and each of the factors 


N-D N-D-1 N-D-n+x+1 N-D 
N-x’” N-x-1?"  — N=n+1 N? 


means (BY CY 


which is the binomial distribution. 
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Figure 2.17 Hypergeometric distribution with N = 12,n = 3, and D = 4. 
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Figure 2.18 Hypergeometric distribution with N = 1000,n = 30, and D = 400. 


Some Specific Hypergeometric Distributions 


It is useful at this point to look at some specific hypergeometric distributions. Our initial 
example, in Section 2.10 had N = 12,n = 3, and D = 4. A graph of the probability distri- 
bution is shown in Figure 2.17. 

As the population size increases, we expect the hypergeometric distribution to appear 
more binomial or normal-like. Figure 2.18 shows that this is the case. Here, N = 1000, 


D = 400, and n = 30. 
While Figure 2.17 shows no particular features, Figure 2.18 shows again the now famil- 


iar normal appearance. 


EXERCISES 2.10 


1. A carton of 12 light bulbs contains 1 defective bulb. A sample of 3 bulbs is chosen. 
What is the probability the sample contains the defective bulb? 


2. Let X denote the number of defective bulbs in the sample in problem 1. Find E[X]. 
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. A lot of 50 fuses is known to contain 7 defectives. A random sample of size 10 is drawn 


without replacement. What is the probability the sample contains at least | defective 
fuse? 


. Acollection of 30 gems, all of which are identical in appearance and are supposed to 


be genuine diamonds, actually contains 8 worthless stones. The genuine diamonds are 

valued at $1200 each. Two gems are selected. 

(a) Let X denote the total actual value of the gems selected. Find the probability dis- 
tribution function for X. 

(b) Find E(x). 


. (a) A box contains three red and five blue marbles. The marbles are drawn out one at 


a time and without replacement, until all of the red marbles have been selected. 
Let X denote the number of drawings necessary. Find the probability distribution 
function for X. 


(b) Find the mean and variance for X. 


. (a) A box contains three red and five blue marbles. The marbles are drawn out one 


at a time and without replacement, until all the marbles left in the box are of the 
same color. Let X denote the number of drawings necessary. Find the probability 
distribution function for X. 


(b) Find the mean and variance for X. 


. A lot of 400 automobile tires contains 10 with blemishes that cannot be sold at full 


price. A sampling inspection plan chooses 5 tires at random and accepts the lot only if 
the sample contains no tires with blemishes. 
(a) Find the probability the lot is accepted. 


(b) Suppose any tires with blemishes in the sample are replaced by good tires if the 
lot is rejected. Find the AOQ of the lot. 


. A sample of size 4 is chosen from a lot of 25 items of which D are defective. Draw the 


curve showing the probability the lot is accepted as a function of D if the lot is accepted 
only when the sample contains no defective items. 


. A lot of 250 items which contains 15 defective items is subject to an acceptance sam- 


pling plan that calls for a sample of size 6 to be drawn. The lot is accepted if the sample 
contains at most | defective item. 


(a) Find the probability the lot is accepted. 


(b) Suppose any defective items in the sample are replaced by good items. Find 
the AOQ. 


In problem 5, suppose now that the entire lot is inspected and any blemished tires 
replaced by good tires if the lot is rejected by the sample. Find the AOQ. 


In problem 7 if any defective items in the lot are replaced by good items when the 
sample rejects the entire lot, find the AOQ. 


Exercises 5 and 6 can be generalized. Suppose a box has a red and b blue marbles and 
that X is the number of drawings necessary to draw out all of the red marbles. 


(a) Show that 
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(b) Using the result in part (a), show that a recursion can be simplified to 


PX=x) _x-1 
P(X =x-1)  x—a 


» x=atlat2,...,at+b. 


(c) Show that the recursion in part (b) leads to 


a+b a+b 
» x-(x-a):P(X=x= by x-(x-1)- P(X =x-1). 
x=atl x=atl 


From this, conclude that 
a+b+1 


E(X)=a- 
ne a+1 


(d) Show that 
_a-b-(at+b+1) 


ae en 


13. (Exercise 12 continued) Now suppose X represents the number of drawings until all 
the marbles remaining in the box are of the same color. Show that 


x= mina, b],..,a+b—-1, 


and that 
a:b a:b 


E(X) = ——. 
&) aol bel 


14. A box contains three red and five blue marbles. The marbles are drawn out one at a 
time without replacement until a red marble is drawn. Let X denote the total number 
of drawings necessary. 

(a) Find the probability distribution function for X. 
(b) Find the mean and the variance of X. 
15. Exercise 14 is generalized here. Suppose a box contains a red and b blue marbles, 


and that X denotes the total number of drawings made without replacement until a red 
marble is drawn. 


(a) Show that 


(b) Using the result in part (a), show that a recursion can be simplified to 


PX=x) — b-x+2 


Se OE, 8 4 
PxSs-0 aebae2a1° ~ 


www.it-ebooks.info 


2.11 


2.11 Acceptance Sampling (Continued) 119 


(c) Use the recursion in part (b) to show that 


aes 
+1 
and V(X) = a-b-(atb+l) 


(a+ 1)? -(a+2) 


(d) Show that the mean and variance in part (c) approach the mean and variance of the 
geometric random variable as both a and b become large. 


ACCEPTANCE SAMPLING (CONTINUED) 


We considered an acceptance sampling plan in section “Acceptance Sampling”, and we 
saw that some gains can be made with respect to the average quality delivered when the 
unacceptable items in either the sample or in the entire lot are replaced with good items. 
We can now discuss some specific results, dealing with lots that are usually large. We first 
consider the effect of the size of the sample on the process. 


Example 2.11.1 


A lot of 200 items is inspected by drawing a sample of size n without replacement; the 
lot is accepted only if all the items in the sample are good. Suppose the lot contains 
2%, or 4, unacceptable items. Then the probability the lot is accepted by this sampling 


plan is 
196 
n 
200) 
n 
This is a steadily decreasing function of n, as we would expect. We find that ifn = 5, 
the probability the lot is accepted is 0.903, while ifn = 30, this probability is 0.519. A graph 


of this function is shown in Figure 2.19. 
Not surprisingly, large samples yield more accurate results than small samples. 


Example 2.11.2 


Now we consider the effect of the quality of the lot on the probability of acceptance. Sup- 
pose p% of a lot of 1000 items is unacceptable. The sampling plan is this: select a sample of 
100 and accept the lot if the sample contains at most 4 unacceptable items. The probability 


the lot is accepted is then 
1000p \ _ / 1000 — 1000p 
a 100 - x 


a 1000 
100 


www.it-ebooks.info 


120 Chapter2 Discrete Random Variables and Probability Distributions 


Probability 
o 
fee) 


° 
Ba 


o 
a 


0.5 : 
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 


n 


Figure 2.19 Effect of sample size, n, on a sampling plan. 


This is a decreasing function of the percentage of unacceptable items in the lot. These 
values are easily calculated. If, for example, the lot contains 10 unacceptable items, then 
the probability the lot is accepted is 0.9985. 

A graph of this probability as a function of p is shown in Figure 2.20. 


P(Accept) 


, ; 
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Figure 2.20 Effect of quality in the lot on the probability of acceptance. 


The curve in Figure 2.20 is called the operating characteristic (or OC) curve for the 
sampling plan. Sampling plans are often compared by comparing the rapidity with which 
the OC curves for different plans decrease. 

In this case, the sample size is small relative to the population size, so we would expect 
that the nonreplacement of the sample items will have little effect on the probability the lot 
is accepted. A binomial model approximates the probability the lot is accepted if in fact 
it contains 10 unacceptable items as 0.9966 (we found the exact probability above to be 
0.9985). 


www.it-ebooks.info 


2.11 Acceptance Sampling (Continued) 121 


Producer’s and Consumer's Risks 


Acceptance sampling involves two types of risk: the producer would like to guard against 
a “good” lot being rejected, although this cannot be guaranteed; the consumer, on the other 
hand, wants to guard against a “poor” lot being accepted by the sampling plan, although, 
again, this cannot be guaranteed. 

The words “good” and “poor” of course must be decided in the context of the practical 
situation. Often when these are defined and the probability of the risks set, a sampling plan 
can be devised (specifically, a sample size can be determined) that, at least approximately, 
meets the risks set. 

Consider Example 2.11.1 again. Here the lot size is 200, but suppose that D of these 
items are unacceptable. Again we draw a sample of size n and accept the lot when the 
sample contains no unacceptable items. So 

200 — D 
er) 


200 
") 

Figure 2.21 shows this probability as a function of the sample size, n, where D has 
been varied from 0 to 25. 

Since the curves are monotonically decreasing, it is often possible to select a curve 
(thus, determining a sample size) that passes through two given points. If the producer 
would like lots with exactly 1 unacceptable item rejected with probability 0.10 (so such lots 
are accepted with probability 0.90) and if the consumer would like lots with 24 unacceptable 


items rejected with probability 0.95 (so such lots are accepted with probability 0.05), we 
find a sample size of 22 will approximate these restrictions. To check this, note that 


i. 
S 


P(lot is accepted) = 


= 0.89 


° 
a 


P(Accept) 


2° 
i 


° 
i) 
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D 


Figure 2.21 Some operating characteristic curves. 


www.it-ebooks.info 


122 Chapter2 Discrete Random Variables and Probability Distributions 


@ 


and 


(2) 


A computer algebra system here is of great help in finding an approximate solution to 
the problem. 


Average Outgoing Quality 


We saw that considerable improvement in the quality of the product sold can be made if any 
items in the sample are replaced by good items. This is the most sensible strategy we can 
follow if the sampling is destructive; in that case we have little choice but to take a sample 
since destroyed products must be replaced by others. Recall that the AOQ is the percentage 
of unacceptable items sold to the buyer. We want to consider the behavior of the AOQ in 
this section. 


Example 2.11.3 


Suppose a lot of 100 items actually contains 4 unacceptable items. A sample of 5 items is 
drawn and any unacceptable item in the sample is replaced by a good item. On average, 
what proportion of unacceptable items is sold using this sampling plan? 

Let X denote the number of unacceptable items in the sample. Then X is a hypergeo- 
metric random variable and 


So the AOQ is 


since 4 — X unacceptable items will be sold. But 


(2) (s") 


4 
4. 
NOOS oe Ye 
Q= 10 700 2" (2) 
5 


where the summation is the mean value of a hypergeometric random variable with 
N = 100,n = 5, and D = 4. It follows that 
4 1 4 


AOQ = = -5- = 0.038. 
Q 100 ~=100 100 
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This is less than the population percentage of unacceptable items, 0.04, but not greatly 
less. The sampling has improved the quality of the product sold, but not by much. The 
effect the sampling plan has will increase as the percentage of unacceptable product in the 
lot increases. 

Another possible plan is to replace each unacceptable item in the lot with a good item 
if the sample contains any unacceptable items. Now we deliver either all the unacceptable 
items in the lot or none of them. It follows that 


AOQ = — - P(sample contains no unacceptable items) 


96 
4 5 
~ 100/100 
) 
So the gain is greater if we happen to inspect the entire lot. 


These conclusions are probably not surprising. But more lurks behind the scenes here! 
Let us consider a general example. 


= 0.032475. 


Example 2.11.4 


From a lot of N items that contain D unacceptable items, we draw a sample of size n. 
If the sample contains any unacceptable items, we inspect the entire lot, replacing each 
unacceptable item with an acceptable, or good, item. The resulting lot then contains either 
D or 0 unacceptable items. The AOQ is then 


AOQ = = - P(sample contains no unacceptable items) 


Ce) 

n 
ae —. (2.15) 
NN N 

(*) 

What happens to this product as D increases? Since a increases as D increases and 
Cn) 

nh 


N 


since is a decreasing function of D, it follows that the product in (2.15) attains a 


n 
maximum value. This is true, regardless of the size of D! So, no matter what D is, there is 
a limit for the percentage of unacceptable product sold! This is called the AOQ limit. We 
illustrate this phenomenon in the next example. 


Example 2.11.5 


Consider again the situation when N = 1000 and n = 100. Then 
ee = ”) 
D 100 
AOQ = —— - ——_—__—__. 
Q 1000 1000 
100 
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A graph of the AOQ is shown in Figure 2.22, showing that the maximum value of the 
AOQ is about 0.35%. 


me, 
Pee, 
"Pe etees 
ee eececees. 
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Figure 2.22 AOQ as a function of the number of unacceptable items in the lot. 


Double Sampling 


Occasionally, lots are accepted if the sample contains, say, at most c unacceptable items and 
are subject to total inspection if a sample has d or more unacceptable items, where d > c. 
Often if the number of unacceptable items falls between c and d, another sample is taken. 
We illustrate this procedure with a concrete example. 

A lot of 500 items contains 40 unacceptable items. A sample of 50 is taken and the lot 
accepted if the sample contains no more than 3 unacceptable items. If the sample contains 
4 or 5 unacceptable items, an additional sample of 30 is taken; the lot is accepted only if 
this additional sample contains no unacceptable items. Otherwise, the lot is rejected. We 
want the probability the lot is accepted. 

Let X denote the number of unacceptable items in the first sample. Then the probability 
the lot is accepted on the basis of the first sample is 


(*) ( 460 ) 
‘4 : 
ae ee 50 —x 
PX<3)= y) 
0 500 
50 
Now if the first sample has 4 or 5 unacceptable items, then the second sample is taken. 
The lot now contains 450 items of which 40 — X are unacceptable while 450 — (40 — X) = 


410 +X are good items. The second sample of size 30 must contain only good items. So 
the probability the lot is accepted on the basis of the second sample is 


40\ ( 460 410 +x 
5 \ x 50-x 30 
x=4 500 ) 
50 
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Figure 2.23 Probability lot is accepted in a double sampling plan. 
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Figure 2.24 Average outgoing quality for a double sampling plan. 


The probability the lot is accepted is then 


40\ ( 460 40\ ( 460 410 +x 
3 \ x 50-x 5 \ x 50 —x 30 
ee ee + oT + 4 = ),445334, 
= 500 = 500 450 
50 50 30 


125 


and, assuming that none of the unacceptable items found in the samples are sold, the AOQ 


1S 


40 460 
sw-0-(2)-(88) 
x=0 500 (30) 
5 40-9:(2) (soma) ("50") 
woo (2) Gi) (5) 
30 


= 0.0334579. 
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Here, the sampling plan has reduced the percentage unacceptable item sold from 
40/500 = 0.08 to 0.033 on average, so the plan is quite effective. 

With the aid of a computer we can easily vary the number of unacceptable items in the 
lot and observe the effect this has on both the probability the lot is accepted and the AOQ. 
These graphs are shown in Figures 2.23 and 2.24. 

The maximum value for the AOQ is 0.0386071 and occurs when the lot contains 29 
unacceptable items. 


EXERCISES 2.11 


1. A lot of 100 produced items contains 10 defective items. A sample of 5 is chosen and 
all the defective items in the sample are replaced by good items. 
(a) Find the probability the sample contains at least | defective item. 
(b) Find the proportion of defective items sold under this sampling plan. 

2. In problem 1, find the AOQ if all the defective items in the lot are replaced by good 
items when the sample contains any defective items. 


3. A day’s production of 25 television sets from a small company has 4 sets that have 
defects and cannot be sold. The company inspects its product by selecting 2 sets; if at 
most | of these has defects, the lot is shipped. What is the probability the lot is shipped? 


4. A shipment of 1500 washers contains 400 defective and 1100 nondefective items. Two 
hundred washers are chosen at random, without replacement. 
(a) Find the probability that exactly 90 defective items are found. 
(b) Approximate the probability in part (a) by using the binomial distribution. 

5. A lot of 100 fuses is inspected by a quality control engineer who tests 10 fuses selected 


at random. If 2 or fewer defective fuses are discovered, the entire lot is accepted. Find 
the probability the lot is accepted if it actually contains 20 defective fuses. 


6. A lot of 25 items contains 4 defective items. A sample of size 2 is chosen; the lot is 

accepted if the sample shows no defective items. 

(a) Find the probability the lot is shipped. 

(b) If any defective items in the sample are replaced by good items before the lot is 
shipped, find the AOQ. 

(c) Now suppose the lot contains D defective items and that the entire lot is rectified 
if the sample shows any defective items. Plot the OC curve. 

7. A bakery has a batch of 100 cookies, 5 of which are burned. A sample of 3 cookies is 
chosen and the batch put out for sale if none of the cookies in the sample is burned. 
(a) What is the probability the batch of cookies is put out for sale? 

(b) Find the AOQ if any burned cookies in the sample are replaced by good cookies. 
(c) Assuming that the batch contains B burned cookies and that the entire batch is 
rectified if any of the cookies in the sample is burned, show the OC curve. 


8. In Exercise 6, suppose the number of defective items is unknown and also suppose a 
rejected lot is subject to 100% inspection and that any defective item in the population 
is replaced by a good item. Estimate the AOQ limit from a graph of the AOQ. 


9. In Exercise 7, suppose that the entire batch of cookies is inspected if the sample should 
reject the batch. Estimate the AOQ limit from a graph. 
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In inspecting a lot of 500 items, it is desired to accept the lot if the lot contains 1 
defective item with probability 0.95 and it is desired to accept the lot if the lot contains 
20 defective items with probability 0.05. Suppose the lot is accepted only if the sample 
contains no defective items. What sample size is necessary? 


A producer inspects a lot of 400 items and wants the probability the lot is accepted 
if the lot contains 1% defectives to be 0.90; the consumer wants the probability a lot 
containing 5% defective items to be accepted with probability 0.60. Suppose the lot is 
accepted only if the sample contains no defective items. Find the sample size so that 
the sampling plan meets these risks. 


A double sampling plan is carried out from a lot of 500 items. A sample of 10 is selected 
and the lot is accepted if this sample contains no unacceptable items; if this sample 
contains 3 or more unacceptable items, the lot is rejected. If the sample contains 1| or 2 
unacceptable items, a second sample of 20 is drawn; the lot is then accepted if the total 
number of unacceptable items in the two samples combined is at most 3. Suppose that 
at any stage an unacceptable item is replaced by a good item. 


(a) Find the probability a lot containing 15 unacceptable items is accepted. 

(b) Graph the probability in part (a) as a function of D, the number of unacceptable 
items in the lot. 

(c) Find the AOQL for this double sampling plan if unacceptable lots are rectified. 

(d) Approximate the probability that the lot is accepted using the binomial distribution. 


A lot of 400 items that actually contains 3 defective items is subject to the following 
double sampling plan: the lot is accepted if a first sample of 5 contains no defectives; the 
lot is rejected if this sample contains 2 or more defectives; if the first sample contains 
1 defective, a second sample of 7 is drawn; the lot is accepted if the second sample 
contains no more than | defective, otherwise the lot is rejected. Suppose that at any 
stage an unacceptable item is replaced by a good item. 


(a) What is the probability the lot is accepted? 

(b) What is the AOQ if all the defective items in unacceptable lots are replaced by 
good items? 

(c) Show the OC curve for the sampling plan. 

A day’s production of 200 compact disks is inspected as follows. If an initial sample 

of 15 shows at most 2 defective disks, the lot is accepted and is subject to no more 

sampling. However, if the first sample shows 3 or more defective disks, then a second 

sample of 20 disks is chosen and the lot is accepted if the total number of defectives in 

the two samples is no more than 4. 

(a) Find the probability the lot is accepted if, in fact, it contains 10 defective disks. 

(b) Find the AOQ. 

(c) Plot the OC curve. 

A random sample of 100 items is chosen from a lot of 4500 items, which is 2% defec- 

tive. If the sample contains no more than 4 defective items, the lot is accepted; other- 

wise, the remainder of the lot is inspected and defective items are replaced by good 

items. 

(a) What is the average number of items inspected? 


(b) Graph the average number of items inspected as a function of the percentage defec- 
tive in the lot. 
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2.12 THE HYPERGEOMETRIC RANDOM VARIABLE: 
FURTHER EXAMPLES 


Example 2.12.1 A Lottery 


Lottery games have become popular in many states. The game is played as follows: a player 
chooses five different white balls numbered from 1, 2, ..., 59. Another red ball is then 
chosen from balls numbered from 1, 2, ... ,35. This choice, called a powerball, may match 
one of the first five white balls chosen. Lottery officials then choose the five integers and 


the powerball. 
The number of integers the player correctly chooses from among the first five is a 
hypergeometric random variable, which we will call X. Then 


Since the choices are independent, it follows that 


a) f 34 he 
x =x y l-y 
59 
5 
Here is a table of values of X and Y giving the probabilities with which the possible 
values occur. The jackpot, J here, may vary from week to week. 


P(X =xandY=y)= x=0,1,...,5; y=0,1. 


xX Y Probability Payoff 
1 -~9 
1 ————_ =5.7070x1 

2 175,223,510 > 070% 10 : 

5 09 —t =! <4, 9404 10-7 $1,000,000 
87611755. 5,153,633. 7 

4. — __~_! __4 5499x10- —- $10,000 
17522351 648,976 — ; 

4 932 * asain $100 


17522351 19,079 
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31. 28! __! 28 1673x10 $100 
17522351 12,244 
48654 1 7 
NOT hr 7 
3 9 77509351 ~ 361 aan $ 
24804 fl 7 
2 ee 2 Kaye 7 
17522351 707 a0 $ 
316251 - 
; oe te 1 4 
35044702 ~ 1190 ~ 00? x 10 $ 
S165) 21182 0 $4 


17522351 55 


The probability the player selects at least one of the five integers correctly is 


(0) G) 


1 —- ———— _ = 0.3683. 


(5) 


ays : : . 1 17 27 
The probability the player wins something is 75003510 * s7e11755 + 17500551 * 
918 + 1431 48654 24804 316251 316251 = 0.0314007. What 


17522351 17522351 17522351 17522351 35044702 17522351 
TET Tau 7 
ORO + aaa Senos aie ee Ds = ts roe al 
3 3 
al Tae ‘ anaes ie na ee OBO 
So for the expected value of a ticket to be $3, the cost of a ticket including the guess 


for the powerball, the jackpot, J, must be $462,504,000. 


is the value of a ticket? The expected value of a ticket is «J+ 


Example 2.12.2 A Card Game 


A bridge hand consists of 13 cards chosen without replacement from a deck of 52 cards. 
What is the most likely distribution of suits in the bridge hand? 

In the hypergeometric random variable, the sampling is done from a population con- 
taining two kinds of items; here, we generalize the distribution somewhat to sample from 
a population containing four kinds of items, namely, the suits in the deck. 

It would appear, since the suits all occur with equal frequency in the deck, that the most 
likely distribution of suits might be four of one suit and three each of the remaining three 


suits. This has probability 
ay Jay” £13 
3 3 4 
52 
13 
since we first choose three suits; then we choose three cards from each of those suits; finally, 
we choose four cards from the remaining suit. 


P(4, 3, 3,3) = 
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However, the distribution of four cards from each of two suits, three cards from one 
suit, and two cards from the remaining suit has probability 


@)-(2)-@-G)-@) 


P(4, 4, 3,2) = 
52 
13 
P(4,4,3,2) _ 45 ae ; 
We find that Pa333) — 2 so the distribution of four cards from each of two suits, 


three cards from another suit, and the remaining two cards from the fourth suit is over 
twice as likely as the more uniform distribution of suits. Any other combination of suits is 
less likely than the combination found here. 


2.13 THE POISSON RANDOM VARIABLE 


In Section 2.4, we considered the binomial random variable whose probability distribution 
is 


P(X =x) = (") -p*-g"*, x=0,1,2, ...,n 


where gq = 1 — p. 
Events that have small probability — rare events — are of particular interest and we turn 
our attention to them. 
We calculated the recursion from the binomial probability distribution function: 
n-x+1 


Pike get SO pear 1, €H 1 Dn cih: (2.16) 
qd x 


Now notice that the recursion could also be written as 


- -—1 
Fe ee a 
(1 —p)-x 
In this form of the recursion, we fix x and let n > oo and p > 0 while keeping np, 
which we denote by 4, fixed. These presumptions allow us to concentrate on events that are 
rare. We see that under these conditions, in the limit we have 


Fe ae nea «cae ee ae (2.17) 
XxX 


Our task now is to determine the function P(X = x), which satisfies the recursion (2.17). 
Applying the recursion repeatedly, we find that 


PX = 1) =4- P= 0), 
PK=D=4.px=n=*+ pao, 
3 
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and so on. Since YP = x) = 1, it follows that 
n=0 


2 3 
px=0- {iit Sate. bat 


3! 
or 
P(X =0)-e =1, 
so P(X = 0) =e", 
PX =1)=A-e%, 
and 
2. A 
P(X =2)= rT 
We conjecture then that 
Mee ‘ig 
P(X = x)= —_—, x= 0, 1,2, ae (2.18) 
x! 


Formula (2.18) defines the Poisson probability distribution with parameter 4. The 
reader should check that (2.18) satisfies (2.17). It is also easy to check that (2.18) is a 
probability distribution. Obviously, P(X = x) > 0 and 


x=0 x=0 x=0 
= wm 2B - 
— pA ga? phi 
=e (144+ F404 ) e“-e 1 


So (2.18) is a probability distribution function. 


Mean and Variance of the Poisson 


It should be no surprise that the mean of the Poisson distribution is np since we found 
the Poisson by taking the limit of the binomial with mean np and keeping np fixed. The 
calculation is as follows: 


w= ye P(X =x)= ye = a 
4x- 1 
ae 


=e4%-j-e=ih. 


www.it-ebooks.info 


132 


Chapter 2 Discrete Random Variables and Probability Distributions 


For the variance, it is easiest to calculate ELX - (X — 1)] and then make use of the fact 
that 
Var(X) = E[X - (X — 1)] + E(X) - [EQOP. 


Here, 
oe —A , 4x 
EIX (X= Dl = Px-@-)-* a 
x=0 x! 


ae ae ee ae ae 


from which it follows that 
Var(X) = A. 


That should not really be much of a surprise. After all, the variance of the binomial is 
n-p-q=n-p-(1—p)=A-(1—p). To find the Poisson, we let A stay fixed and p > 0. 
Soda-(_—-p) > A. 


Some Comparisons 


The Poisson distribution was derived here as an approximation to the binomial distribution. 
It is now interesting to compare some binomial distributions and their Poisson approxi- 
mations to measure, to some extent, how close the approximation is. We use a computer 
algebra system to make the calculations and the graphs. 

First, consider a binomial variable with n = 20 and p = 0.03. Here, the value of n is 
not particularly large nor is p particularly small. 

We show here P(X = x) using both the binomial distribution and the Poisson approxi- 
mation. 


xX Binomial Poisson 

0 0.548812 0.543794 

1 0.329287 0.336358 

2 0.0987861 0.0988297 

3 0.0197572 0.0183395 

4 0.00296358 0.00241061 
=) 0.00035563 0.000238576 


So the values are very close. The graphs in Figure 2.25 reveal the same observation. 

As n increases, the approximation is generally very good. Figure 2.26 shows a com- 
parison between a binomial with n = 100 and p = 0.030 and a Poisson distribution with 
A=3. 

The curves, which exhibit again the normal-like shape, are remarkably close and differ 
the most at the maximum point; this difference is 0.00343232. 
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Figure 2.25 (a) Poisson distribution with 
parameter 0.60. (b) Binomial distribution 
with n= 100, p=0.03. 


Figure 2.26 (a) Poisson distribution with 
parameter 3. (b) Binomial distribution with 
n= 100, p = 0.03. 
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Example 2.13.1 


An acceptance sampling plan selects 5 items from a population of 500 items, 16 of which 
are unacceptable. The lot is accepted if at most 2 of the sampled items are unacceptable. 
Here, we compare the exact (hypergeometric) probability with both binomial and Poisson 
approximations: 

The hypergeometric probability is 


(*) ( 484 ) 
5 ? 
x SSK 
P(X < 2) = ji) 
=n 500 
> 
The binomial approximation uses n =5 and p= 16/500 and is equivalent to no 
replacement, so 


- (5\ (16\" (484\5* 
pax s2)= 5 (2) - (555) (=) = 0.999688. 


= 0.99974. 


16 


Finally, the Poisson distribution with A = 5 - a. 0.16 gives 
2 e7 0-16 ‘ (0.16)* 
P(X <2) = )' ———— = 0.999394. 
x=0 x! 


The approximations continue to be very good. Actually, the error in the Poisson distri- 
bution when approximating the binomial is not easily characterized, but many advise that 
its use is best when np < 5, a rule that generally works fairly well. 

Figure 2.26 compares the binomial with n = 100 and p = 0.03 to the Poisson distribu- 
tion with A = np = 3. The Poisson approximation is seen to be remarkably good. 


2.14 THE POISSON PROCESS 


The Poisson distribution serves as an approximation to the binomial distribution. The bino- 
mial distribution models situations where we can observe both the number of successes and 
the number of failures for a certain number of trials of an experiment. 

We now turn our attention to other events that occur in time or space. We may be 
interested in the following examples: the number of faults in a fixed length of optic cable; 
the number of customers arriving at a checkout counter in a store; the number of telephone 
calls received at a telephone switchboard; or the number of messages received at a computer 
terminal. In each of these examples, we can count the number of occurrences of an event 
(such as the number of faults in the optic cable), but we cannot count the number of failures. 
How can such phenomena be modeled? 

Consider a continuous interval of length or time or space and suppose that the following 
are true for the events we wish to observe the following: 


1. The number of events in intervals having no points in common is independent. 
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2. Consider a short interval of length h. Suppose the probability of exactly one event 
in this interval is A-h, where A is a constant of proportionality. 


3. The probability of more than one event in the interval of length h is 0. 


Now divide an interval of unit length into n mutually exclusive parts. By assumption 
(2), the probability of exactly one event in this interval is A - (1/m), and so the probability 
of no events in this interval is 1 — A/n. Letting X denote the number of events in the unit 
interval, assumptions (1) and (3) allow us to calculate 


PXx=H= (") (<=) (1 2 AY =O Zeno 


But we have shown that this can be approximated by a Poisson variable with mean 
value n- 4 = A. So the Poisson distribution can be used in situations for which assumptions 
(1)-(3) hold. We see that A is the expected number of events in a period of time or in an 
interval of space. 


Example 2.14.1 
Calls come into a telephone switchboard at a rate of 4 per minute. 


(a) Find the probability of exactly 6 calls in an interval of 2 minutes. 


(b) Find the probability of at least 3 calls in 3 minutes. 
Here, the interval of interest changes. We present two different solutions. 


Solution 1 

In part (a) the interval is of length 2 minutes so we might suspect that A, the expected 
number of events in that interval, is 8. Proceeding on that assumption, and letting X denote 
the number of calls received in that interval, we have 


e8 . 86 


P(X =6)= = 0.122138. 


In part (b), the interval is 3 minutes and so A = 12 here. We find 


2 yl. 9% 
P(X >3)=1-P(XX<2)=1-) os = 0.999478. 
x! 


x=0 


It is not clear, however, that we can change the interval and retain a Poisson 
variable. 


Solution 2 
(a) Let the random variables X, and X, be defined as follows: 
Let X, denote the number of calls received during the first minute 
and let X, denote the number of calls received during the second minute. 
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Since the number of calls received during the first and second minutes are independent, 


6 
P(X, + Xp = 6) = DPX, = x1) P(X) = 6 - xy) 
x,;=0 
ent 4* e+ . 46-x| 
*< x,! (6—x,)! 
Multiply and divide by 6! to obtain 
e* ° 6 

on eS xX; , 46-x 
P(X, +X) = 6) = & > (°) 4 1. 46-1 


and now by the binomial theorem, 


PUK, +X, = 6) = (44.4) = S86 
Re gg eae er 


giving the same result as in Solution 1. 


Solution 2 indicates that the sum of independent Poisson random variables is again 


a Poisson random variable. We will return to a discussion of this fact and related facts in 
Chapter 5 where Solution | will be completely justified. 


EXERCISES 2.14 


1. 


Show the Poisson approximation to the binomial distribution with n = 5 and probabil- 
ity 0.2 and draw a graph of these probabilities. 


. Show a recursion for the Poisson distribution in problem | and use it to calculate the 


probabilities in problem 1. 


. A Poisson random variable has 


P(X =2)= SP(X = 1). 


Find P(X = 3). 


. Deaths in a small city occur at a rate of 5 per week and are known to follow a Poisson 


distribution. 

(a) What is the expected number of deaths in a 3-day period? 

(b) What is the probability no one dies in a 3-day period? 

(c) What is the probability that at least 250 people die in 52 weeks? 


. Traffic accidents at an intersection are assumed to follow a Poisson distribution with 4 


accidents expected in a period of | year. 
(a) What is the probability of at most | accident in a given year? 
(b) What is the probability of exactly 3 accidents in 6 months? 


(c) It is expected that 2 accidents occur during a year at another intersection. What is 
the probability that there is a total of at least 3 accidents in a given year at the two 
intersections? 
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. The number of typographical errors per page in a book follows a Poisson distribution 


with parameter 3/4. What is the probability that there is a total of 10 errors on 10 
randomly selected pages in the book? 


. Twenty percent of the IC chips made in a plant are nonfunctional. Assume that a bino- 


mial model is appropriate. 


(a) Find the probability that at most 13 nonfunctional chips occur in a sample of 100 
chips. 
(b) Use the Poisson distribution to approximate the result in part (a). 


. Let X, the number of hits in a baseball game, be a Poisson variable with parameter a. 


If the probability of a no-hit game is 1/3, what is a? 


. An insurance company has discovered that about 0.1% of the population is involved in 


a certain type of accident each year. If the 10,000 policy holders of the company are 
randomly selected from the population, what is the probability that not more than 5 of 
its clients are involved in such an accident next year? 


A study of customers entering a grocery store shows that all the arrivals are Poisson 
with males entering on an average rate of 3 per minute and females at an average rate 
of 5 per minute. Find the probability that at least 20 customers enter the store in the 
next 5 minutes. 


Computer programs run on a certain computer are executed during an interval of 1 
minute according to a Poisson process with mean 12. Twenty-five percent of these 
programs utilize a plotter. 


(a) What is the probability there will be a demand for at least 15 programs run in a 
given minute? 

(b) The plotter takes 10 seconds to execute a plot. What is the expected number of 
seconds the plotter is in use during a given minute? 


A multiple choice examination contains 4 choices for each of 100 questions. 
(a) Find the exact probability a student who guesses misses at most 4 questions. 
(b) Approximate the probability in part (a) using the Poisson distribution. 


The number of earthquakes of destructive magnitude in California follows a Poisson 
distribution with one such earthquake expected each year. What is the probability of at 
least three such earthquakes in a 6-month period? 


A quality control inspector follows the following plan in inspecting soccer balls that are 
produced according to a Poisson process with four soccer balls expected each minute. 
The produced balls fall into a bin that automatically empties at the end of each minute. If 
the bin collects exactly three balls, the inspector takes them out for possible inspection 
of 10 seconds each. He flips a fair coin for each and inspects them only if heads appear. 
If the bin should contain five balls, he spends 5 seconds inspecting each ball. Otherwise, 
the inspector does not inspect the output. What is the average amount of time per minute 
spent in inspecting the soccer balls? 


Major crimes are reported at an average rate of 5 per night in a given police precinct. 
The number of these crimes is assumed to follow a Poisson distribution. 


(a) What is the probability that on a given night no more than three major crimes will 
be reported? 


(b) What is the chance that a full week will pass with no more than three major crimes 
reported on any of the seven nights? 
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16. 


17. 


18. 


19. 


20. 


21. 


22. 


An airline knows that 10% of the people holding reservations for a certain flight will 
not appear. The plane holds 90 people. Use the Poisson approximation in answering 
the following questions: 

(a) If95 reservations have been sold, what is the probability that everyone who appears 
for the flight can be accommodated? 

(b) How many reservations should be sold so that the probability the airline can accom- 
modate everyone who appears is at least 0.99? 

Molecules of a rare gas occur at an average rate of 3 per cubic foot of air and follow a 

Poisson distribution. 

(a) What is the probability that a cubic foot of air contains none of the molecules? 

(b) What is the probability that 3 cubic feet of air contain exactly four of the 
molecules? 

(c) How much air must be taken as a sample to make the probability at least 0.99 so 
that at least one molecule will be found? 

A librarian shelves 1000 books per day. If the probability any particular book is mis- 

shelved is 0.001 and if the books are shelved independently of each other, 

(a) What is the probability that at most 2 books are misshelved? 

(b) Approximate the probability in part (a) using the Poisson distribution. 

A popular chocolate chip cookie “guarantees” at least 16 chocolate chips per cookie. 

The actual number of chocolate chips per cookie, however, is a Poisson random vari- 

able. What must be the average number of chips per cookie if approximately 95% or 

more of the cookies are to meet the guarantee? 

A bakery makes a batch of 1000 chocolate chip cookies and adds n chocolate chips 

to the batter for each batch and mixes the batter well. Under these assumptions, the 

number of chocolate chips per cookie should follow a Poisson distribution. 

(a) If n = 4900, what is the probability that at least 2 chips are in a randomly selected 
cookie? 

(b) If n = 4900, what is the number of cookies in each batch that are expected to 
contain exactly 3 chocolate chips? 

(c) FDA regulations declare that at most 1% of cookies labeled “chocolate chip” can 
fail to contain a single chocolate chip. What is the minimum value for n for the 
bakery to be within the law? 

A truck repair shop has facilities for the repair of 3 large trucks per day. The trucks 

arrive according to a Poisson process with 2 trucks expected per day. If more than 3 

trucks arrive, the excess is turned away. 

(a) Find the probability exactly 3 trucks arrive in | day. 

(b) Find the probability that trucks are turned away. 

(c) Find the probability distribution for X, the number of trucks serviced per day. 

(d) Find the expected number of trucks turned away each day. 

(e) The shop decides to add facilities so that it can service the trucks arriving during 
a day about 95% of the time. How many trucks must it be able to service in a day? 

Calls come into a very busy switchboard at a rate of 6 per minute according to a Poisson 

process. Unfortunately some new electronic switching devices work imperfectly and 

the probability a received call is switched to the proper extension is only 0.8. It has 
been observed that the calls are switched independently, however. 
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(a) If X represents the number of calls correctly switched, find P(X = k) for some 
1-minute period. 
(b) Simplify the result in part (a) and show that X is Poisson with parameter 4.8. 


23. Telephone calls coming into a busy switchboard follow a Poisson distribution with 4 
calls expected in a 1-minute period. The switchboard, however, can answer at most 6 
calls in a 1-minute interval; any calls exceeding 6 during that period receive a busy 
signal. 

(a) Let Y denote the number of calls answered in a 1-minute period. Find the proba- 
bility distribution for Y. 
(b) Find E(Y). 


CHAPTER REVIEW 


This chapter considers several discrete probability distributions whose importance arises 
from the fact that they have various applications. Each of the distributions in this chapter 
arises in one way or another from the binomial distribution. 

We began by defining a random variable as a real-valued function defined on the points 
of a sample space. A typical example is throwing two dice and then recording the sum that 
appears. The sum is a random variable since it is a function, in this case the sum, of the 
outcomes of the particular sample point that occurs. 

If X is arandom variable, we defined the probability distribution function, or pdf, as 


fl) = P(X =»). 
A related function is the distribution function, defined as 
F(x) = P(X < x). 


The distribution function is not often used in this chapter, but has very important appli- 
cations in the work to come. 

Probability distributions are often distinguished and described by the values of their 
mean, j1,, and their variance, a, These are defined as 


H, = E(X) = }ix- f(x) and 


x 


a7 = Var(X) = BUX ~ wy)” = Yer HF) 


provided, of course, that the sums exist. The variance, o2, can also be calculated as 


x 


o? = E(X*) - [E(X)/’. 


As a (crude) indication that o actually measures the variation, or dispersion in arandom 
variable, we proved Tchebycheff’s inequality: 


1 
P(X — p| sk-o)21-5, 


where k is some positive quantity. 
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We then turned to some specific discrete probability distributions. Of these, the single 
most important probability distribution is the binomial distribution whose pdf is given by 


PX =x= (") pq'*, x=0,1,2,...,1 where gq = 1-—p. 


This random variable arises from an experiment of n independent trials on each of 
which the result is one of two outcomes (usually denoted by “success” or “failure’”), where 
p denotes the probability of success and X denotes the total number of successes. 

We used a recursion to find that, for the binomial distribution, 


H=Nn-p 
and o* =n-p-q. 
We then considered some statistical problems. We first considered the construction of 
a confidence interval when sampling from a binomial distribution with known values of 


n and p. Frequently, however, p is unknown. We found an approximate 95% confidence 
interval for p to be 


g nX + 2n— 2\W/n2X + n? — nX2 re nX + 2n+2Vn2X 4+ n?2 — nX? 
n?+4n a n? +4n 
= 0.95 
where X is the observed number of successes in the binomial process with n trials. 
Tests of hypotheses were then considered. We examined tests of 
H,: p = Po against the alternative 
Ag: P = Pa: 


The two types of error in testing a hypothesis are 


a = Probability of a Type I error 
= P[H, is rejected when it is true] and 
PB = Probability of a Type II error 
= P[H, is accepted when it is false]. 
We considered the effect of the critical region — the set of observed values leading to 
the rejection of the null hypothesis — on the size of # and discussed 1 — f, the power of the 
test. This is the probability that a false H, is correctly rejected. 


We derived the mean and the variance of a sample proportion arising from a sample 
survey. Using these results, we found that 


(»,-2: ES ppt £2) 09 
Von Von 
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is a95% confidence interval for the unknown true proportion p based on a sample proportion 
Ps: 

The negative binomial distribution arises when we, in a binomial experiment, wait for, 
say, the rth success. The probability distribution function is 


P(X =x)= a pg", x=rrti,rt2,... 


with mean p = * and variance o2 = “f In the special case where r = 1, so that we wait 
for the first binomial success, X 1s éalicd a geometric random variable. 

A common situation in which the hypergeometric random variable arises is that of 
acceptance sampling. Here a lot, or a collection of a product manufactured over a given 
period of time, is sampled, but, unlike the binomial distribution, the sampling is done with- 
out replacement. If the lot actually contains D unacceptable items and N — D acceptable 
items and if X denotes the number of unacceptable items in a sample of size n, then 


(1) C=) 


P(X =x = aT a x=0,1,2,...,Min{n, D}. 
6) 
N-n D N-D 


We found that p =n- ? and that o? =n- Nal’ N° ONS We showed that the hyper- 
geometric random variable is approximated by the binomial random variable when the 
sample size, n, is small in comparison to the lot size, N. Examples of acceptance sam- 
pling were given, and we considered two plans for improving the quality of the lot of items 
sent to the buyer. In one plan we replaced any unacceptable items in the sample by good 
items; in the second plan, if the sample so indicated, we replaced every unacceptable item 
in the lot by a good item. Each plan leads to gains with respect to the quality of the outgoing 
product; under the second plan there is a limit of the percentage of unacceptable product 
that can be sold. This is known as the AOQ limit. 

Finally, we considered a Poisson random variable that can be regarded in two ways: 
we first found the distribution as a limit of the binomial distribution when n is large and p is 
small. We also considered the Poisson process in which events occur over a period of time 
or space in an independent manner so that the probability of more than one independent 
event in a given interval is negligible, and that the probability of an event in some interval 
is proportional to the length of the interval. These assumptions yield the same distribution 
as the limiting binomial distribution. The Poisson distribution has a variety of applications, 
many of which were given in the exercises. 

In the next chapter, we will consider some important continuous probability 
distributions. 


PROBLEMS FOR REVIEW 


Exercises 2.2 #1, 3, 4, 8, 11 
Exercises 2.3 # 1, 2,5, 6, 7, 11 
Exercises 2.5 # 1, 2, 4, 7, 8, 9, 12, 15, 17, 19, 22 
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Exercises 2.6 # 1, 4, 5, 8,9 

Exercises 2.7 # 1, 4, 6, 7, 10 

Exercises 2.8 # 1, 3, 4, 8,9 

Exercises 2.9 # 1, 2,5, 7,9, 13 
Exercises 2.10 # 1, 3, 5, 6, 8, 10 
Exercises 2.11 #2, 4,5, 6,9, 11 
Exercises 2.14 #2, 3, 5, 6, 8, 13, 18, 20 


SUPPLEMENTARY EXERCISES FOR CHAPTER 2 


1. 


Calls come into a telephone exchange at a rate of 1.5 per minute. Assuming that the 
number of calls received follows a Poisson distribution, find the probability that at least 
3 calls are received in the next 4 minutes. 


. Twenty percent of the IC chips made in a plant are defective. Assume that the chips are 


produced according to a binomial process. 
(a) Find the probability that at most 13 defectives occur in a sample of 100 IC chips. 
(b) Approximate the probability in part (a) by a Poisson random variable. 


. A manufacturer of soft drink bottles turns out defectives with probability 0.10. Assume 


that the bottles are produced according to a binomial process. 

(a) Find the probability that there are 4 defective bottles among the next 10 bottles 
produced. 

(b) Find the probability that there are at least 4 defective bottles among the next 10 
bottles produced. 

(c) How many bottles must be produced to make the probability that at least one bottle 
among them is defective to be at least 0.95? 


. Earthquakes in a certain part of California occur according to a Poisson process with 


three earthquakes expected each century. 
(a) What is the probability of exactly four earthquakes in a century? 
(b) What is the probability of at least two earthquakes in a 50-year period? 


(c) Let X be the number of earthquakes in a century. Compare the exact value of P(u — 
o < X < yw +o) with the approximation given by Tchebycheff’s inequality. 


. Suppose an event has probability p of occurring and that several independent trials are 


observed. What value of p maximizes the probability that the first failure occurs on the 
fifth trial? 


. Suppose that X and Y are independent observations of a Poisson random variable with 


parameter y = 1. Find the probability that the smallest of the two observations is 1. 


. A series of trials in which success or failure occurs on each trial has probability of 


success at the ith trial as ror In three trials, find the probability of exactly 2 successes. 


. A manufacturer makes a lot of 10 items a day. Two items are drawn (without replace- 


ment) and inspected. The lot is accepted if the sample contains at most | defective item. 
Find the probability a lot containing 3 defective items is accepted. 


- (a) What is the probability that a poker hand contains exactly 2 aces? 
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(b) How many poker hands must be selected to make the probability of having at least 
one hand containing at least 2 aces be at least 0.99? 

A store sells chocolate donuts at a rate of 16 per hour, the number sold following a 

Poisson distribution. Find the probability that the store sells at least 3 chocolate donuts 

in 15 minutes. 


Five defective transistors are mixed up with 10 good transistors. They are inspected 
one after another until all the good transistors have been found. What is the probability 
the last good transistor will be found on the 12th test? 


Errors are known to occur in a digitized message in a communications channel; the 
probability an individual bit is incorrectly transmitted is 0.001 and the errors are 
assumed to be independent. 
(a) Find the probability that at most 2 errors occur in a sequence of 10 bits. 
(b) Find the mean and variance of the number of errors. 
(c) Find the probability of at most 2 errors in a message of 10,000 bits. 
In a small voting precinct, 100 voters favor candidate A and 80 voters oppose candidate 
A. What is the probability that a majority of a random sample of 4 voters will oppose 
candidate A? 
Customers arrive at a checkout counter in a supermarket at a rate of 20 per half hour, 
the number following a Poisson distribution. What is the probability that at most 5 
customers arrive in a period of 15 minutes? 
A manufacturer produces items that are good or defective, according to a binomial 
process where p is the probability an item is defective. Let X denote the number of 
items produced up to and including the second defective item. 
(a) Find an expression for the probability that X is even. 
(b) Now suppose that the sixth item is the second defective item produced. What is 
the most likely value for p? 
Thirty percent of the applicants for a position have advanced training in computer pro- 
gramming. Three jobs requiring advanced training are open. Find the probability that 
the third qualified applicant is found on the fifth interview, supposing that the applicants 
are interviewed sequentially and at random. 
A fair die is tossed until a 5 or a 6 appears. Compute the probability that the number 
of tosses is a multiple of 4. 
From a lot of 25 items, 5 of which are defective and 4 are chosen at random. Let X be 
the number of defectives found. Find the probability distribution of X if 
(a) 1. the items are chosen with replacement. 
2. the items are chosen without replacement. 
(b) In part (a) assume that the items are chosen with replacement until a defective item 
is found. What is the probability an odd number of drawings is necessary? 
Customers arrive at a computer store according to a Poisson process with 5 customers 
expected per hour. The sales force can accommodate at most 10 customers per hour; if 
more than 10 customers appear in an hour, the excess must be turned away. 
(a) What is the probability that customers are turned away in a 1-hour period? 
(b) Consider two independent 1-hour intervals. Let X denote the number of arrivals 


during the first hour and Y the number of arrivals during the second hour. Find 
P(X+Y <8). 
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20. 


21. 


22. 


23. 


24. 
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26. 


27. 


28. 


29. 
30. 


31. 


Fifty chocolate chip cookies are to be made using 150 chocolate chips. The number of 
chocolate chips per cookies is a Poisson random variable. 


(a) What is the probability a cookie has at least 4 chocolate chips? 


(b) How many chocolate chips must be used in order to make the probability a cookie 
has at least one chocolate chip be at least 0.90? 


A pair of fair dice is rolled 180 times each hour in a dice game at a casino. What is the 
probability that at least 25 rolls give a sum of 7 during | hour? 


Telephone calls come into an answering service at an average rate of 3 per hour, the 
number of calls following a Poisson distribution. During the noon hour, only the first 
3 calls are answered. What is the expected number of calls answered during the noon 
hour? 


A box contains 4 bad and 6 good tubes. The tubes are checked by drawing a tube at 
random and not replacing it in the box. In how many ways can the fourth bad tube be 
found on the seventh drawing? 


A box contains three blue and four yellow marbles. Marbles are drawn out one at a 
time, the drawn marbles not being replaced. Drawings are made until all the marbles 
remaining in the box are of the same color. 


(a) Assign probabilities to the sample points and verify that their sum is 1. 

(b) What is the probability that only yellow marbles remain in the box when the sam- 
pling is finished? 

A tosses three coins that have probability p, of coming up heads while B tosses two 

coins that have probability pp of coming up heads. 


(a) Find an expression for the probability that A tosses more heads than B. 
(b) Show that the game is fair if the coins are fair. 


A player pays $A to play the following game: a coin, loaded to come up heads with 
probability 2/3, is tossed five times. Let X denote the number of heads. The player wins 
$(X + 1) if X is even and wins $(X — 1) if X is odd. Find A so that the game is fair. 


A machine, producing defective parts with probability 1/10, has produced five parts. 
Unknown to the operator of the machine, an adjustment to the machine increases this 
probability to 1/5. Ten parts are produced after the adjustment. What is the probability 
the output contains at least 2 defectives? Assume the parts are produced according to 
a binomial process. 


Past studies have shown that 2/3 of professional football players will sustain a perma- 
nent injury before retiring. To see if this proportion is true for current players, a sample 
of 100 retired professional football players showed that 80 of them had sustained per- 
manent injuries. Using a = 0.05, test H,: p = 2/3 against H,: p > 2/3. 

In problem 28, find the size of # for the alternative p = 0.72. 


A study of 1200 college students showed that 44% of them said that their political 
views were similar to those of their parents. Find a 95% confidence interval for the 
true proportion of college students whose political views are similar to those of their 
parents. 


A drug is thought to be effective in 10% of patients with a certain condition. To test this 
hypothesis, the drug is given to 100 randomly chosen patients with the condition. If 8 
or more show some improvement, H,,: p = 0.10 is accepted; otherwise, H,: p < 0.10 
is accepted. Find the size of the test. 
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Jack thinks that he can guess the correct answer to a multiple choice question with 
probability 1/2. Kaylyn thinks his probability is 1/3. To decide who is correct, Jack 
takes a multiple choice test, guessing the answer to each question. If he answers at 
least 40 out of 100 questions correctly, it will be decided that Jack is correct. Find a 
and f for this test. 


A survey of 300 workers showed that 100 are self-employed. Find a 90% confidence 
interval for the proportion of workers who are self-employed. 


A management study showed that 1/3 of American office workers has his or her own 

office while 1/33 of Japanese office workers has his or her own office. The study was 

based on 300 American workers and 300 Japanese workers. Could the difference in 

these proportions only be apparent and due to sampling variability? [Use 90% confi- 

dence intervals. ] 

The Internal Revenue Service says that the chance a United States Corporation will 

have its income tax return audited is 1 in 15. A sample of 75 corporate income tax 

returns showed that 6 were audited. Does the data support the Internal Revenue Ser- 

vice’s claim? Use a = 0.05. 

A survey of 400 children showed that 1/8 of them were on welfare. Find a 95% confi- 

dence interval for the true proportion of children on welfare. 

How large a sample is necessary to estimate the proportion of people who do not know 

whose picture is on the $1 bill to within 0.02 with probability 0.90? 

Three marbles are drawn without replacement from a bag containing three white, three 

red, and five green marbles. $1 is won for each red selected and $1 is lost for each 

white selected. No payoff is associated with the green marbles. Let X denote the net 

winnings from the game. Find the probability distribution function for X. 

Three fair dice are rolled. You as the bettor are allowed to bet $1 on the occurrence of 

one of the integers 1, 2, 3, 4, 5, or 6. If you bet on X and X occurs k times (k = 1, 2,3), 

then you win $k; otherwise, you lose the $1 you bet. Let W represent the net winnings 

per play. 

(a) Find the probability distribution for W. 

(b) Find E(W). 

(c) If you could roll m dice, instead of 3 dice, what would your choice of m be? 

(a) Suppose that X is a Poisson random variable with parameter 4. Find 4 if 
P(X = 2) = P(X = 3). 

(b) Show if X is a Poisson random variable with parameter A, where A is an integer, 
then some two consecutive values of X have equal probabilities. 

Calls come into an office according to a Poisson process with 3 calls expected per hour. 

Suppose that the calls are answered independently, with the probability that a call is 

answered as 3/4. Find the probability that exactly 4 calls are answered in a 1-hour 

period. 

Let X be Poisson with parameter A. 

(a) Find a recursion for P(X = x + 1) in terms of P(X = x). 

(b) Use the recursion in part (a) to find y and 07. 

Ten people are wearing badges numbered 1, 2, ... 10. Three people are asked to leave 

the room. What is the probability that the smallest badge number among the three is 5? 
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Continuous Random Variables 
and Probability Distributions 


INTRODUCTION 


Discrete random variables were discussed in Chapter 2. However, it is not always possible 
to describe all the possible outcomes of an experiment with a finite, or countably infinite, 
sample space. As an example, consider the wheel shown in Figure 3.1 where the numbers 
from 0 to | have been marked on the outside edge. 

The experiment consists of spinning the spinner and recording where the arrow stops. 
It would be natural here to consider the sample space, S, to be 


S={x]0<x< 1}. 


S is infinite, but not countably infinite. 

Now the question arises, “What probability should be put on each of the points in S$?” 
Surely, if the wheel is fair, each point should receive the same probability and the total 
probability should be 1. What value should that probability be? 

Suppose, for the sake of argument, that a probability of 0.0000000000000000000001 = 
10-7 is put on each point. It is easy to show that the circumference of the wheel contains 
more than 107” points, so we have used up more than the allotted probability of 1. So we 
conclude that the only possible assignment of probabilities is 


P(X = x) = 0 for any xin S. 


Now suppose that the wheel is loaded and that it is three times as likely that the arrow lands 
in the left-hand half of the wheel than in the right-hand half. We suppose that 


P(x>5)=3-P(x<5). 


Again we ask, “What probability should be put on each of the points in S?” Again, since 
there is still an uncountably infinite number of points in S, the answer is 


P(X =x) =0. 


Probability: An Introduction with Statistical Applications, Second Edition. John J. Kinney. 
© 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc. 
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1.0 
3/4 1/4 


1/2 Figure 3.1 The spinner. 


Definition If a random variable X takes values on an interval or intervals, then X is said 
to be a continuous random variable. 


Of course, P(X = x) = 0 for any continuous random variable X. 

So the probability distribution function is not informative in the continuous case, since, 
for example, we cannot distinguish between fair and loaded wheels! The fault, however, 
lies not in the answer, but in the question. Perhaps we can devise a question whose answer 
carries more information for us. 

Consider now a function, f(x), which we will call a probability density function. The 
abbreviation is again pdf (the same abbreviation used for probability distribution function), 
but the word density connotes a continuous distribution. Here are the properties we desire 
of the new function f(x): 


1. fx) >0 
2. [2 fde=1 
3. Ifa <b, then f’ f(x) dx = Pa < X <b) 


These properties are quite analogous to those for a discrete random variable. Property (3) 
indicates that areas under f(x) are probabilities. f(x) must be nonnegative, else we encounter 
negative probabilities, so property (1) must hold. Property (2) indicates that the total prob- 
ability on the sample space is 1. 

What is f(x) for the fair wheel? Since the circumference of the wheel contains the 


interval [0,1/4], and since the wheel is a fair one, we would like P(O < X < ) to be : 
1 


so we must have i S@)dx = i: Many functions have this property. But we would like 


any interval of length * to have probability In addition, we would like an interval of 
length a, say, to have probability a for 0 < a < 1. The only function that has this property, 
in addition, to satisfying the above-mentioned properties (1) and (2) is a uniform probability 


density function: 
1, O<x<l 
x)= 
fe) a otherwise. 
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For the loaded wheel, where we want P(X > 5) = 3P(X < >), consider (among many other 


choices) the function 
2x, O<x<l 
f@)= { 


0, otherwise. 


1 1 3 1 1 1 1 
Then P(X > 7 = fi 2x dx = 77 80 that P(X < a) =F and so P(X > > =3P(X < 5): 


A graph of Few shown in Figure 3.2. 

It is also easy to verify that f(x) also satisfies properties (1) and (2) for a probability 
density function. 

We see that f(x), the probability density function, distinguishes continuous random 
variables in an informative way while the probability distribution function (which is useful 
for discrete random variables) does not. 

To illustrate this point further, suppose the wheel has been rigged so that it is impos- 
sible for the pointer to stop between 0 and = while it is still fair for the remainder of the 
circumference of the wheel. It follows then that 
a <x<l 
f@s5° * 


3 > 
0, otherwise. 


This function satisfies all three properties for a probability density function. Its graph is 
shown in Figure 3.3. For this rigged wheel, 


1 
P(x>>)= 7 
2 i 3 3 


It is also useful to define a cumulative distribution function (often abbreviated to dis- 
tribution function), which is defined as 


F@) =PX <x = ‘i * foo de. 


We used F(x) in Chapter 2. 


2 


0.5 


0 02 04 06 08 1 
x 


Figure 3.2 Probability density function for the loaded wheel. 
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1.4 


0 0.2 0.4 0.6 0.8 1 
x 


Figure 3.3 Probability density function for the rigged wheel. 


The function F(x) accumulates probabilities for a probability density function in 
exactly the same way as F(x) accumulated probabilities in the discrete case. 
As an example, if 


1, O<x<l 
0, otherwise, 


f(x) = 


then, being careful to distinguish the various regions in which x can be found, we find that 


[ vane x<0 


0 5 
F(x) = / oars | ldx=x, O0<x<l 


oe) 0 

0 1 (oe) 
/ ode [ tdc+ [ Odx=1, x21. 
oo 0 1 


A graph of F(x) is shown in Figure 3.4. 


1 
0.8 
0.6 
Ww 
0.4 
0.2 
0 
0 0.5 1 1.5 2 
x 


Figure 3.4 Cumulative distribution function for the fair wheel. 
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It is easy to see that F(x), for any probability density function, f(x), has the following 
properties: 
lim F(x) =0 and lim F(x) = 1, 
Ifa < b, then F(a) < F(b), 
Pia<X<b)=P(a<X <b)=P(a<X <b) = F(b) - F(a), and 


d[F( 
aan = f(x), provided the derivative exists. 
Ix 


Mean and Variance 


In analogy with the discrete case, the mean and variance of a continuous random variable 
with probability density function f(x) are defined as 


EX) =yp= = - f(x) dx and (3.1) 
Var(X) = 0? = E(X — py = / ;: (x — uw)? - f(x) dx (3.2) 


provided that the integrals converge. 

These definitions are similar to those used for discrete random variables where the 
values of the random variable were weighted with their probabilities and the results added. 
Now it is natural to integrate so that the definitions for the mean and the variance in the 
continuous case appear to be analogous to their counterparts in the discrete case. 

We can expand the definition of Var(X) to find that 


Var(X) = / (x- yy -f(@)dx = / (x7 — 2ux + pw?) + f(x) dx 


Tr. fords uf x-foyaes 2 [pods 


= - x -f@dx-2w +, 


so Var(X) = E(X”) — [E(X)/’. 


This is the same result we obtained for a discrete random variable. 
Other properties of the mean and variance are as follows: 


E(aX + b) = aE(X) +b 
Var(aX + b) = a’ Var(X) 


To show these properties, first consider E(aX + b). By definition, 


E(ax + b) = ix + b)- f(x) dx. 
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Expanding and simplifying the integrals, we find that 


E(ax +b) = af seracro [fords 
so E(aX +b) = aE(X) +b 
or 
E(ax +b) =a-yut+b, 


establishing the first property. 
Now 


Var(aX + b) = E[(aX + b) — (aw +b) 
= Ela’(X - p)*] = a E[(X — w)’I, 
so Var(aX + b) = a”Var(X), 
establishing the second property. 


The definitions of the mean and variance are dependent on the convergence of the 
integrals involved. To show that this does not always happen, consider the density 


1 
f@m= aidccey <X< OO. 


The fact that 


[penn [Ute bomen BE (8)] = 


together with the fact that f(x) > 0 establishes f(x) as a probability density function. 
However, 


E(X) = / 2 ee In|x||%°, which does not exist. 
-o A(1+x*) 22 


The random variable X in this case has no variance as well; in fact E[X*] does not exist for 
any k. The probability density is called the Cauchy density. 
We now turn to an example of a better behaved probability density function. 


Example 3.1.1 


Given the loaded wheel for which 


2x, OSS 1 
jor= {i 


, otherwise, 
we find that 
0, 
Fa =4x*, O<x<l 
1, 1 
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If we want to calculate P G <x< ), we could proceed in two different ways. First, 


where f(x) was used in the calculation. We could as easily have used F(x): 
e(gexsd)=r(3)-#(2)=4 
2 4 4 2 16 
giving the same result. 


It would appear from this example that F(x) is superfluous, since any probability can be 
found from a knowledge of f(x) alone (and in fact f(x) is needed to determine F(x)!). While 
this is true, it happens that there are other important uses to which F(x) will be put later 
and so we introduce the function now. To pique the reader’s interest, we pose the following 
question: the loaded wheel above is spun, X being the result. The player then wins $3X?. 
If the owner of the wheel wishes to make, on average, $0.50 per play of the game, what 
is a fair price to charge to play the game? We will answer this question later, making use 
of F(x), although the reader may be able to answer it now. The function F(x) also plays a 
leading role in reliability theory, which is considered later in this chapter. 


Example 3.1.2 


A random variable X has probability density function 


ae ee ae 


0 otherwise. 


The constant k, of course, is a special value that makes the total area under the curve 1. 


It follows that ; 
| k-(2-—x) dx=1. 
0 


It follows from this that k = 1/2. 
Now if we wish to find a conditional probability, for example, P(X > 1|X > 5). first 


note that the set of values where X > : does not have area 1, so, as in the discrete case, 


p(x21 and x>}) 


P(x> 1x2 >) = 
2 P(x>t) 
2 
This becomes PIX>1 
p(x21|k>3) = a2” 
2 p(x>t) 


We calculate this conditional probability as 4/9. 
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Before turning to the exercises, we note that there are many important special 
probability density functions of great interest since they arise in interesting and practical 
situations. We will consider some of these in detail in the remainder of this chapter. 


A Word on Words 


We considered, for a discrete random variable, the probability distribution function as well 
as the cumulative distribution function. For continuous random variables, the terms proba- 
bility density function and cumulative distribution function are terms in common usage. 

We will continue to make the distinction here between discrete and continuous random 
variables by making a distinction in the language we use to refer to them. In part, this is 
because the mathematics useful for discrete random variables is quite different from that 
for continuous random variables; the language serves to alert us to these distinctions. One 
would not want to integrate a discrete function nor try to sum a continuous one! 

While we will be consistent about this, we will also refer to random variables, either 
discrete or continuous as following or having a certain probability distribution function. 
So we will refer to a random variable as following a binomial distribution or another ran- 
dom variable as following a Cauchy distribution although one is discrete and the other is 
continuous. 


EXERCISES 3.1 


1. A loaded wheel has probability density function 
f(x) = 3x, O<x< 1. 


(a) Show that f(x) is a probability density function. 
. 1 3 

(b) Find P(5 <x <2). 

(c) Find P(X > 2). 


(d) Find c so that P(X >c) = =. 
2. A random variable X has probability density function 


Com a O0<x<l 


0, otherwise. 
(a) Find k. 
(b) Find P (x > ?). 
(c) Calculate P (x > : [x > ; ) 
3. If 
ksinx, O<x<aq, 
fa) = {i otherwise. 


(a) Show that k = 1/2. 
(b) Calculate P (x < =), 


(c) Find b so that P(X < b) = . 
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4. A random variable X has probability density function 


ae oa O<x<l, 


0, otherwise. 


(a) Find the mean, y, and the variance, o?, for X. 


(b) Calculate exactly P(u — 20 < X < w+2c) and compare your answer with the 
result given by Tchebycheff’s inequality. 


5. The length of life, X, in days, of a heavily used electric motor has probability density 


function 
3e73*, x >0. 
x)= 
fe) ‘ otherwise. 


(a) Find the probability the motor lasts at least 1/2 of a day, given that it has lasted 1/4 
of a day. 


(b) Find the mean and variance for X. 


6. A random variable X has probability density function 


Poe a x> 0. 


0, otherwise. 


(a) Find k. 
(b) Graph f(x) 
(c) Find yw and o. 


7. The distribution function for a random variable, X, is 


0, x<-4 
- -4<x<-3 
3” < 
3 

F(x) = 3° —3<x<2 
=, 25K <5 
1, «25 


(a) Find P(X = 2). 
(b) Find P(-3 <x < 2). 


8. A continuous random variable, X, has probability density function 


fey RO) 0<x<l 


0, otherwise. 


(a) Find k. 
(b) Find the mean and variance of X. 
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10. 


11. 


12. 
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(c) Four independent observations of X are made. What is the probability that exactly 
two of these are greater than 3/4? 


~+-, —2<x<0 


=4 O<x<2 

0, otherwise. 

(a) Show that f(x) is a probability density function. 
(b) Find P[|X| < 1]. 

(c) Find yp and o?. 


A random variable, X, has probability density function 
x O<x<l 

f@M=42-x 1<x<2 
0 otherwise. 


(a) Find E[X]. 
(b) Find Var[X]. 
(c) Find F(x), being sure to specify this for all values of x. 


(d) What is the probability that at least two of three independent observations on X are 
greater than 1/2? 


The length of time, Y, in hours, a student takes to complete an examination is a random 


variable with 
cy? +y, O<y<l 
gy) = 


0, otherwise. 


(a) Find c. 
(b) Find the cumulative distribution function, G(y). 
(c) Find an expression for P(Y > y) for any value of y. 


As a measure of intelligence, mice are timed when going through a maze to reach a 
reward of food. The time (in seconds) required for any mouse is a random variable Y 
with probability density function 


10 
fo) =4”” 


0, otherwise. 


y>10 


(a) Show that f(y) has the properties of a probability density function. 

(b) Find PO < Y < 99). 

(c) Find the probability a mouse requires at least 15 seconds to traverse the maze if it 
is known that the mouse requires at least 12 seconds. 


13. A continuous random variable has probability density function 


2 _ 
for 3<x<3 


0, otherwise. 
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14. 


15. 


16. 


17. 


18. 


19. 


(a) Find the mean and variance of X. 

(b) Verify Tchebycheff’s inequality for the case k = 2. 
Suppose the distance X between a point target and a shot aimed at the point in a video 
game is a continuous random variable with probability density function 


porn {50-2 -I<x<] 


0, otherwise. 


(a) Find the mean and variance of X. 
(b) Use Tchebycheff’s inequality to give a bound for P Cs |< ;] : 


If the loaded wheel with f(x) = 2x, 0 <x < 1, is spun three times, it can be shown 
that the probability density function for Y, the smallest of the three values obtained, is 
2(y) = 6y(1 — y*)*,0 < y < 1. Find the mean and variance for Y. 


1260 y*(1-yy, O<y<1 
0 


Show that g(y) = otherwise 


is a probability density function. 


Then find the mean and variance for Y. 


Use your computer algebra system to draw a random sample of 100 observations from 


the distribution 
1, O<x<l 
x= 
FO) < otherwise. 


The random variable X here is said to follow a uniform distribution on the interval 

[0, 1]. 

(a) Enumerate the observations in each of the categories O <x <0.1, 0.1 <x < 0.2, 
and so on. Do the observations appear to be uniform? 

(b) We will show in Chapter 4 that if X is uniform on the interval [0, 1] and if Y = 
X*, then the probability density function for Y is g(y) = 2y, 0<y< 1. So the 
sample in part (a) can be used to simulate a random sample from the loaded wheel 
discussed in Section 3.1. Show a sample from the loaded wheel. Graph the sample 
values and decide whether or not the sample appears to have been selected from 
the loaded wheel. 

Gorey d=", OS ys 

0, otherwise 

tion for 7 a positive integer and r = 1, 2, ..., 7. 


Show that g(y) = is a probability density func- 


Given 
2 
- O<x<l 
2 
a x- =) » ween 2 
fay=44 V2 
2 
a 2<x<3 
0, otherwise. 


(a) Sketch f(x) and show that it is a probability density function. 
(b) Find the mean and variance of X. 


www.it-ebooks.info 


3.2 Uniform Distribution 157 


20. Suppose that X is a random variable with probability distribution function F(x) whose 
domain is x > 0. 


(a) Show that GU — F(x)| dx = E(X). [Hint: Write the integral as a double integral 
and then change the order of integration. ] 
(b) Write an integral involving F(x) whose value is E(X?). 
21. Prove formulas 3.1 and 3.2. 


3.2 UNIFORM DISTRIBUTION 


The fair wheel, where f(x) = 1,0 < x < 1, is an example of a uniform probability density 
function. In general, if 


] a<x<b 
f@a{b-a 


0, otherwise 


then X is said to have a uniform probability distribution. This is the continuous analogy of 
the discrete uniform distribution considered in Chapter 2. A graph is shown in Figure 3.5. 
The mean and variance are calculated as follows: 


b 
aw = [ d= 25S and 


Var(X) = E(X2) — (E(X))* 


db 2 2 
var = — ax- (744) 


_b-a Gay 


~ 3b=a) \ 2 
_ 2 
wane. 
ip 
0.4 
0.3 Ff 
- O02 
0.1 
2 0 1 2 3 4 5 6 
x 


Figure 3.5 Uniform distribution on the interval [a, b]. 
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Example 3.2.1 
Suppose X is uniform on the interval [1,5]. Then 


foy=t, 1<x<5. 
4 
Suppose also that we have an observation that is at least 2. What is the probability the 
observation is at least 3? 
We need 
P(X > 3) 


PX 2 31X22) = SS 


BIW) ple 
WIN 


Example 3.2.2 


The wheel in Example 3.2.1 is spun again, the result being X. Now we spin the wheel until 
an observation greater than the value of X is found, say Y. What is the expected value for Y? 

Since the wheel is a fair one, we suppose that Y is uniformly distributed on (x, 5) so 
that 


foe l<x<5. 
5-x 


Then 
5 
B= [| a 
ae oe, 
Ree as 


~ 35 =x) 
_ tx 
a 


a natural result since the central value on the interval (x, 5) is oe 


EXERCISES 3.2 


1. The arrival times of customers at an automobile repair shop is uniformly distributed 
over the interval from 8 a.m. to 9 a.m. If a customer has not arrived by 8:30 a.m., what 
is the probability he will arrive after 8:45 a.m.? 


2. A traffic light is red for 60 seconds, yellow for 10 seconds, and green for 90 seconds. 
Assuming that arrival times at the light are uniformly distributed, what is the probability 
a car stops at the light for at most 30 seconds? 


3. A crude Geiger counter records the number of radioactive particles a substance emits, 
but often errs in the number of particles recorded. If the error is uniformly distributed on 
the interval, what is the probability the counter will underrecord the number of particles 
emitted? 
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4. Suppose that X is a random variable uniformly distributed on the interval (—2, 2). 
. 1 
(a) Find P (= <2). 
3 1 
(b) Find P (3 <2). 
5. Let X be uniformly distributed over the intervals (0,1) and (2, 3). 
If PO<X <1)=2-PQ2<X <3), findPQu-o <X<pto). 


6. The termination of a chemical reaction occurs at a random time 7 between 6 and 7.5 
hours after the start of the experiment. The time follows a uniform distribution. 


(a) What is the probability the reaction lasts at least 6.5 hours and no more than 6.75 
hours? 


(b) If the reaction is run four independent times, what is the probability that in exactly 
one of the four replications of the experiment the reaction will last no more than 

6.5 hours? 
7. Suppose that X is uniformly distributed on 0 < x < 12. Use Tchebycheff’s inequality 
to establish a bound on P ( ur =e -0<X< yt Sy : s) and then verify that the 


bound is correct. 


8. Let X be a uniform random variable on the interval | < x < b Determine b so that 
a, = An, 
9. A random variable X is uniformly distributed on the interval —1 <x < 1. Find 
P (; <xX< ?). 


10. Find the probability that at least two of four random observations of a uniform random 
variable on the interval [0,10] are greater than 7. 


11. Suppose X is a uniform random variable on the interval [a,a +2]. Find a if P[e* < 
1.765] = 7 


3.3 EXPONENTIAL DISTRIBUTION 


Example 3.3.1 


Customers in a checkout line at a supermarket find that the times the checkers take in the 
checkout process follows the probability density function 


f@a=e™*, x>0 


where X is measured in minutes. We see that f(x) > 0 and that i to dx = 1, so f(x) 
defines a probability density function. Here, f(x) is an example of an exponential probability 
density function. What is the probability a customer’s checkout time is at least k minutes? 
This is a 
P(X >h= / e* dx=e™*, (3.3) 
k 


Another calculation yields a somewhat surprising result. Suppose the checkout time has 
been at least s minutes. What is the probability it will be at least s + t minutes? Using the 
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formula for conditional probability and formula (3.3) earlier, we have 


P(XX>s+0 

P(X > s) 

—(s+t) 
== =e’ so that 
es 


P(X >s+t|\X>5)=P(X>0). 


P(X>stitX>s)= 


Consequently, the probability the customer waits t minutes more, given that the waiting 
time has been at least s minutes, is the same as the probability the waiting time is ini- 
tially t minutes! The fact that the customer has been waiting s minutes appears not to affect 
the future waiting time at all. We call this property of the exponential probability distri- 
bution the memoryless property. (It can be shown that the exponential probability density 
function is the only probability density function for which the memoryless property holds. 
Among discrete distributions, the geometric probability distribution is the only probability 
distribution function for which the property holds.) 
A more general form of the exponential probability density function is 


fayeae 4, x >a, A>”. 
Our checkout time example is a special case where A = | and a = 0. The graph is shown in 
Figure 3.6 where a has been taken to be 2. 


We note that and that tee f(x) dx = iL * ,e~*0-4) dx = 1, so f(x) satisfies the properties 
of a probability density function. 


Mean and Variance 


For f(x) = Ae?20-, x >a, A> 0, direct calculation shows that 


E(X) = / x- f(x) dx =at 7 and that 


0 1 2 3 4 5 6 
x 


Figure 3.6 An exponential distribution. 
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E(X’) = / xf) dx=a + a + =. so that 


wan (#4343) (044) 


1 
where Var(X) = 2 


Distribution Function 


For the exponential density f(x) = Ae~*°-, x >a, A>0. 


FQ) = [4 dx = 1— e789. 


Example 3.3.2 


Refer again to the checkout line and the waiting time density 
f@m=e™~*, x>0. 


Assume that customers’ waiting times are independent. What is the probability that, of the 
next 5 customers, at least 3 will have waiting times in excess of 2 minutes? 

There are two random variables here. Let X be the waiting time for an individual cus- 
tomer, and Y be the number of customers who wait at least 2 minutes. Here X is exponential; 
since the waiting times are independent and P(X > 2) is the same for every customer, Y is 
binomial. Note that while X is continuous, Y is a discrete random variable. 

It is easiest to start with X, where P(X > 2) determines p in the binomial distribution. 


P(X >2)= i e* dx =e. 
2 


5 


Then P(Y > 3) = » (>) . (e?y’ i qd _ ers, 


y=3 


The value of this expression is 0.020028, so the event is not very likely. 


Example 3.3.3 


A radioactive source is emitting particles according to a Poisson distribution with 14 par- 
ticles expected to be emitted per minute. The source is observed until the first particle is 
emitted. What is the probability density function for this random variable? 

Again we have two variables in the problem. If Xdenotes the number of particles emit- 
ted in | minute, then X is Poisson with parameter 14. However, we do not know the time 
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interval until the first particle is emitted. This is also a random variable, which we call 
Y. Note that Y is a continuous random variable. If y minutes pass before the first parti- 
cle is emitted, then there must be no emissions in the first y minutes. Since the number of 
emissions in y minutes is Poisson with parameter |4y, it follows that 


PY >y=e!%, 


We conclude that 
FQ) =P <y)=1-e'% 


and so dF) 
fo) = HY 214.6, yr 
dy 
This is an exponential density. In this example, note that X is discrete while Y is 
continuous. 
Example 3.3.4 


As a final example of the exponential density f(x) = Ae~?°-,_ x > a, we check the mem- 
oryless property for the more general form of the exponential density. 


P(X >s+h) 


P(X >stt\X>s)= 
(X2stixX25)= “aS y 


oe Astt—a) 
= —————— =e 
e7A(s—a) 


—At 


so that P(X >s+t|X > s)= P(X >a+tt). 


So the memoryless property depends on the value for a. 


3.4 RELIABILITY 


The reliability of a system or of a component in a system refers to the lack of frequency 
with which failures of the system or component occur. Reliable systems or components fail 
less frequently than less reliable systems or components. Suppose, for example, that 7, the 
time to failure of a light bulb, has an exponential distribution with expected value 10,000 
hours. This gives the probability density function as 


1 eet. 
= | ——__ =O; 
f@ ( 10000 ) e 10000, t>0 


The reliability, R(‘), is defined as 


Ri) = P(T > 2) 
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that is, R(t) gives the probability that the bulb lasts more than ¢ hours. We assume that 
R(O) = 1 and we see that 


R®=PT>HN=1-P(T<)=1-F(X 


and that 
-R'(t) =f(0. 


Since in this case 


t 
1 ett eset 
F(t) = (—_) 10000 dt = 1 — T0000 
© 7 10000 / © 7 


It follows that R(t) = P(T > 1) = e7 T0000, 


1 
What is the probability that such a bulb lasts at least 2500 hours? This is R(2500) = e4= 
0.7788. So although the mean time to failure is 10000 hours, only about 78% of these bulbs 
last longer than 1/4 of the mean lifetime. 
Were it crucial that a bulb last 2500 hours, say that this happens with probability 0.95, 
what should be mean time to failure be? Let this mean time to failure be m. Then 


_ 2500 


e m =0.95 
so m= 48,740 hours. 


Hazard Rate 


The hazard rate of an item refers to the probability, per unit of time, that an item that has 
lasted ¢ units of time will last Af more units of time. We will denote the hazard rate by 
H(t), so 


Ptt<T<t+At|T> pd) 
At 

_ F(t+ At) — F(t) 

~~ MPF SD 


H(t) = 


As At > 0, H(t) approaches 


fO _fO _ RO 
1-F® RQ Ri) 


H(t) = 


In actuarial work, the hazard rate is called the force of mortality. The hazard rate also occurs 
in econometrics as well as in other fields. In this section, we investigate the consequences 
of a constant hazard rate, A. Consequences of a nonconstant hazard rate will be considered 
later in this chapter. 

Suppose then that 


FO 


HO = Tw 


= A, where A is a constant. 
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Since fO — _@ we have that 
1- F(t) R(t) 
RO _ 
R(t) 


It follows that 


—In[R()] = At+k so that 
Rt) =c-e". 


Now and if we suppose that our components begin life at T = 0, then R(O) = 1 and 
R(t) =e. 


Since f(1) = —R’ (1), it follows that f() = Ae", t>0. 

A constant hazard rate then produces an exponential failure law. It is easy to show that 
an exponential failure law produces a constant hazard rate. 

From Example 3.3.3, we conclude that failures occurring according to a Poisson 
process will also produce an exponential time to failure and hence a constant hazard 
rate. 

Typically, the hazard rate is not constant for components. There is generally a “burn-in” 
period where the hazard rate may be declining. The hazard rate then usually becomes con- 
stant, or nearly so, after which it increases. This produces the “bathtub” function, as shown 
in Figure 3.7. 

Different hazard rates, although constant, can have surprisingly different conse- 
quences. Suppose, for example, that component I has constant hazard rate A while 
component II has hazard rate k- A where k>0. Then the corresponding reliability 
functions are 


R(t) =e while 


Ry (t = e kat = (e*F = [RAO]. 


2 
fo7) 


Failure rate 
o 
fy 


1 2 3 4 5 
Time 


Figure 3.7 A “bathtub” hazard rate function. 
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So the probability component II that lasts ¢ units or more is the kth power of the probability 
component I that lasts the same time. Since positive powers of probabilities become smaller 
as the power k increases, component II may rapidly become useless. 


EXERCISES 3.4 


1. 
2. 


10. 


An exponential distribution has f(x) = 4e~*@-2) for x > 2. Find E[X] and Var[X]. 


In Exercise 1, suppose this is a waiting time density. Find the probability of the next 6 
values, at most 3 are < 2. 


. Let X be an exponential random variable with mean 6. 


(a) Find P(X > 4). 
(b) Find P(X > 4|X > 2). 


. The median of a probability distribution is the value that is exceeded 1/2 of the time. 


(a) Find the median of an exponential distribution with mean A. 
(b) Find the probability an observation exceeds A. 


. Snowfall in Indiana follows an exponential distribution with mean 15” per winter sea- 


son. 
(a) Find the probability the snowfall will exceed 17’ next winter. 


(b) Find the probability that in 4 out of the next 5 winters the snowfall will be less than 
the mean. 


. The length, X, of an international telephone call from a local business follows an expo- 


nential distribution with mean 2 minutes. In dollars, the cost of a call of X minutes is 
3X* — 6X + 2. Find the expected cost of a telephone call. 


. The lengths of life of batteries in transistor radios follow exponential probability dis- 


tributions. Radio A takes 2 batteries, each of which has an expected life of 200 hours; 
radio B uses 4 batteries, but the expected life of each is 400 hours. Radio A works if 
at least one of its batteries operates; radio B works only if at least three of its batteries 
operate. An expedition needs a radio that will function at least 500 hours. Which radio 
should be taken, or does not it matter? 


. Accidents at a busy intersection follow a Poisson distribution with three accidents 


expected in a week. 
(a) What is the probability that at least 10 days pass between accidents? 


(b) It has been 9 days since the last accident. What is the probability that it will be 5 
days or more until the next accident? 


. The diameter of a manufactured part, X, is arandom variable whose probability density 


function is 
f(x) = de®, x>0. 


If X < 1, the manufacturer realizes a profit of $3. If X > 1, the part must be discarded 
at a net loss of $1. The machinery manufacturing the part may be set so that A = : or 
A= >: Which setting will maximize the manufacturer’s expected profit? 


If X is a random selection from a uniform variable on the interval (0, 1), then the 
transformation Y = —A In(1 — X) is known to produce random selections from an expo- 
nential density with mean J. 
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11. 


12. 
13. 


14. 
15. 


16. 


17. 


18. 


19. 


(a) Use a uniform random number generator to draw a sample of 200 observations 
from an exponential density with mean 7. 


(b) Draw a histogram of your sample and compare it graphically with the expected 
exponential density. 

The hazard rate of an essential component in a rocket engine is 0.05. Find its reliability 

at time 125. 

An exponential process has R(200) = 0.85. When is R = 0.95? 

A Poisson process has mean yp. Show that the waiting time for the second occurrence 

is not exponentially distributed. 

Find the probability an item fails before 200 units of time if its hazard rate is 0.008. 

Suppose that the life length of an automobile is exponential with mean 72,000 miles. 

What is the expected length of life of automobiles that have lasted 50,000 miles? 


An electronic device costs $K to produce. Its length of life, X, has probability density 
function 
f(x) = 0.01e°*, x > 0. 


If the device lasts less than 3 units of time, the item is scrapped and has no value. If the 
life length is between 3 and 6, the item is sold for $5; if the life length is greater than 
6, the item is sold for $V. Let Y be the net profit per item. Find the probability density 
for Y. 


Suppose X is a random variable with probability density function 
fase, xe 2. 


(a) Show that a = 2. 

(b) Find the cumulative distribution function, F(x). 

(c) Find P(X > 5|X > 3). 

(d) If 8 independent observations are made, what is the probability that exactly 6 of 
them are less than 4? 

A lamp contains 3 bulbs, each of which has life length that is exponentially distributed 

with mean 1000 hours. If the bulbs fail independently, what is the probability that some 

light emanates from the lamp for at least 1200 hours? 

According to a kinetic theory, the distance, X, that a molecule travels before colliding 

with another molecule is described by the probability density function 


1 


ei, x>0, A>O. 
A 


f(x) = 


(a) What is the average distance between collisions? 
(b) Find P(X > 6|X > 4). 


3.5 NORMAL DISTRIBUTION 


We come now to the most important continuous probability density function and perhaps 
the most important probability distribution of any sort, the normal distribution. On several 
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0.4 


0.3 + 


~ 0.2 


-3 -2 -1 0 1 2 3 
x 


Figure 3.8 Standard normal probability density function. 


occasions, we have observed its occurrence in graphs from, apparently, widely differing 
sources: the sums when three or more dice are thrown; the binomial distribution for large 
values of n; and in the hypergeometric distribution. There are many other examples as well 
and several reasons, which will appear here, to call this distribution “normal.” 

If 


1 = (x-a 2 


f(x) = ————e 2? , ~~O<x<0, -—o <a<o, D>QO, (3.4) 
b-V2-a 


we say that X has a normal probability distribution. A graph of a normal distribution, where 
we have chosen a = 0 and b = 1, appears in Figure 3.8. 

The shape of a normal curve is highly dependent on the standard deviation. Figure 3.9 
shows some normal curves, each with mean 0, but with different standard deviations. We 
will show presently that a is the mean value and b is the standard deviation of the normal 
curve. 

We now establish some facts regarding f(x) as defined earlier. 


1. f(x) defines a probability density function. 
Proof f(x) > 0 and so we must show that jae ie) dx = 1. To do this, let Z = — 


in (3.4). 
We have 


Consider the curve g(x) = e 2 ,—-0O <x < oo, as shown in Figure 3.10. 


1 
V2a 


2 
x 
Let J = ae =e 2 dx. If the curve is revolved around the y-axis, the surface 
T 
generated is 
1 1 (7242 
f(%2 = —e 2) ~00 < x < c0,-00 < Z< 00 

2a 
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Figure 3.9 Some normal probability density functions. 
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Figure 3.10 Revolving the standard normal curve around the y-axis. 


since this surface has circular cross sections and the proper traces in the coordinate 
planes. The volume generated is then 


co (oe) 1 atte dy 
—e 2 dz dx 
:. a V/ 20 
1 2 
—(vV2al) = V2a9’. 
V2 


Vv 


On the other hand, V can be found using cylindrical shells as 


Zz: oUF de = 4/22. 


ive) 


ve 2n 
V2r/0 

So 2 = Land since J > 0,/ = 1. 
So f(x) is a probability density function for —co < a< oandb> 0. 
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2. We now find the mean and variance for X. 


1 


ge Cee ee 
e 7 OD dy 


co co i! 
Be = f xf) ac= f x + —————. 
—00 —00 b-\V/2-x 


Let z= - in the integral so that x = a + bz, and then 


me b is. able 
e2 dz+ fl z-e 2° dz, 
—e 


= 
I. V2-a 2:a 


so, since the first integral is | and the second integral is 0, E(X) = a. 
To find the variance, first calculate 


E(X) =a 


2 . - 50-0)? 
E(X*)= | Ae PM dy. 


2 
e 
ob-V2-n 


Again let z= ae Then 


io2 2 _ly 
Since a i z-e 2° dz=0, it follows that E(X”) = at is Ze 2 dz+a’, 
2-7 2x eS 


which simplifies to 
E(X’) = b? +a’. 


We conclude that 


uM=a and 
o = bd’. 
The probability density function for the normal curve is then usually written as 


1 -—Lo «wp 
f@) = ——e ie a —00 <x <0. 


o:-V2-a 


We will abbreviate this as 


X~N(u,0), 


where the symbol ~ is read “is distributed as.” 

This is our first example of a probability density whose formula involves the 
standard deviation. The implications of this will be encountered in examples and 
problems. 
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3. Finally, we show that if X ~ N(w, o) and if Z = at then Z ~ N(0, 1). We call the 
N(O, 1) curve the standard or unit normal curve. The statement above indicates that 
the transformation Z = *— can be used on an arbitrary normal curve to produce a 
standard normal curve. ° 
To show this consider the cumulative distribution function for Z, G(z), assuming 
that F(x) is the cumulative distribution function for X. Then by definition, 


G(z) = P(Z < z) 


xX- 
(St) 
Oo 


=PX<uto-2=F(uto-z). 


We now use the fact that 


_ dG@) 
(z= x 
to see that 
dG(z) dF(uto-z) dF(uto-z) duto-z) 
o_o eee 


dz dz d(u+o-z) dz 


by the chain rule. It follows that 


g(z) =o -f(w+oz) 
1-3: 


e 2* ,-0 <z<0o 
2n 


a(Z) = 


which is the standard normal distribution. This indicates that problems involving 
arbitrary values of and o can be solved using a single, standard, and normal 
curve. 

We will return to the process by which g(z) was established in Chapter 4 and 
apply the same technique to other problems involving functions of random vari- 
ables. For now, we consider some examples of the normal distribution and problems 
using it. 


Example 3.5.1 


Mathematics aptitude scores, X, on the Scholastic Aptitude Test (SAT) are N(500, 100). 

Find (a) the probability an individual’s score exceeds 600 and 

(b) the probability an individual’s score exceeds 600, given that it exceeds 500. 

(a) Many computer algebra systems will calculate P(X > 600) directly. This will be 
found to be 0.158655. If a computer algebra system is not available, a table of the standard 
normal distribution may be used as follows: 

The Z transformation here is Z = *—2 
using Table | in the Appendix. 

(b) Here, we need P(X > 600|X > 500) = 


, so P(X > 600) = P(Z > 1) = 0.158655 


P(X>600) _ 0.158655 
P(X>500) — 0.500000 


= 0.317310. 
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Example 3.5.2 


What Mathematics SAT score, or greater, can we expect to occur with probability 0.90? 
Here, we know that X ~ N(500, 100) and we want to find x so that 


P(X > x) = 0.90. So,if Z = x a then 
100 

— 500 

p(z>* ) = 0.90, but 

= “100 7 

P(Z > —1.287266) = 0.90 so 
x— 300 _ _ 1.287266 giving 
100 
x= 371. 


Example 3.5.3 


From Tchebycheff’s inequality, we conclude that the standard deviation is in fact a measure 
of dispersion for a distribution, since the probability the interval from uw —k-otou+k-o 
is at least 1 — a) a probability that increases as k increases. When the distribution is known, 
this probability can be determined exactly by integration. We do this now for a standard 


normal density. Again let Z = at 


Puw-o <X <pywt+o)=P(-1 <Z < 1) = 0.6826894921 
Pu -—20 < X <wt+20) = P(-2 < Z < 2) = 0.9544997361 
P(u—30 < X < w+ 30) = P(-3 < Z < 3) = 0.9973002039 


Tchebycheff’s inequality indicates that these probabilities are at least 0, 3/4, and 8/9, 
respectively. 

The earlier results, sometimes called the “2/3, 95%, 99% Rule” can be very useful in 
estimating probabilities using the normal curve. 

For example, to refer again to the Mathematics SAT scores that are N(500, 100), an 
estimate for the probability a student’s score is between 400 and 650 may be found by esti- 
mating the probability the corresponding z-score is between —1 and 1.50. We know that 2/3 
of the area under the curve is between —1 and I, and we need to estimate the probability from 
1 to 1.50. This can be estimated at 1/2 of the difference between 0.95 and 2/3, giving a total 
estimate of 2/3 + (1/2)(0.95 — 2/3) = 0.81. The exact probability is 0.775. It is a rarity 
that the answers to probability problems can be estimated in advance of an exact solution. 

The occurrence of the normal distribution throughout probability theory is striking. 
In the next section, we explain why the graphs of binomial distributions, considered in 
Chapter 2, become normal in appearance. 


EXERCISES 3.5 


1. Mathematics SAT scores are N(500, 100). 
(a) Find the probability an individual’s score is between 350 and 650. 
(b) Find the probability that one’s score is less than 350, given that it is less than 400. 
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2. 


10. 


11. 


In exercise 1, find an SAT score a, such that 
(a) P(Score < a) = 0.95. 
(b) P(a X Score < 650) = 0.30. 


. IQ scores are known to be normally distributed with mean 100 and standard deviation 


10. 
(a) Find the probability an IQ score exceeds 128. 
(b) Find the probability an IQ score is between 90 and 110. 


. The size of a boring in a metal block is normally distributed with mean 3 cm and stan- 


dard deviation 0.01 cm. 
(a) What proportion of the borings have sizes between 2.97 cm and 3.01 cm? 
(b) For the borings exceeding 3.005 cm, what proportion exceeds 3.010 cm? 


. Brads, which are labeled 3/4” are actually normally distributed. Manufacturer I pro- 


duces brads with mean 3/4” and standard deviation 0.002’; manufacturer II produces 
brads with mean 0.749” and standard deviation 0.0018”; brads from manufacturer II 
have mean 0.751” and standard deviation 0.0015”. A builder requires brads in the range 
3/4 + 0.005’. From which manufacturer should the brads be purchased? 


. A soft drink machine dispenses cups of a soft drink whose volume is actually a normal 


random variable with mean 12 0z and standard deviation 0.1 oz. 
(a) Find the probability a cup of the soft drink contains more than 12.2 oz. 
(b) Find a volume, v, such that 99% of the time the cups contain at least v oz. 


. Resistors used in an electric circuit have resistances that are normally distributed with 


mean 0.21 ohms and standard deviation 0.045 ohms. A resistor is acceptable in the 
circuit if its resistance is at most 0.232 ohms. What percentage of the resistors are 
acceptable? 


. On May 5, in Colorado Springs, temperatures have been found to be normally dis- 


tributed with mean 80° and standard deviation 8°. The record temperature on that day 
is 90°. 
(a) What is the probability the record of 90° will be broken on next May 5? 


(b) What is the probability the record of 90° will be broken at least three times during 
the next 5 years on May 5? 


. Sales in a fast food restaurant are normally distributed with mean $42,000 and stan- 


dard deviation $2000 during a given sales period. During a recent sales period, sales 
were reported to a local taxing authority to be $37,600. Should the taxing authority be 
suspicious? 

Suppose that X ~ N(yv,o). Find a in terms of yu and o if 

(a) P(X > a) = 0.90. 

(b) P(X >a) = 5 P(X <a). 

The size of a manufactured part is a normal random variable with mean 100 and vari- 
ance 25. 

If the size is between 95 and 110, the parts can be sold at a profit of $50 each. If 
the size exceeds 110, the part must be reworked and a net profit of $10 is made per 
part. A part whose size is less than 95 must be scrapped at a loss of $20. What is the 
expected profit for this process? 
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Rivets are useful in a device if their diameters are between 0.25” and 0.38’. These 
limits are often called upper and lower specification limits. A manufacturer produces 
rivets that are normally distributed with mean 0.30” and standard deviation 0.03”. 

(a) What proportion of the rivets meet specifications? 

(b) Suppose the mean of the manufacturing process could be changed, but the manu- 
facturing process is such that the standard deviation cannot be altered. What should 
the mean of the manufactured rivets be so as to maximize the proportion that meet 
specifications? 

Refer to problem 10. Suppose that X ~ N(, 0) and that upper and lower specification 

limits are U and L, respectively. Show that if o must be held fixed, then the value of 

that maximizes P(U < X < L) is at 


Manufacturing processes that produce normally distributed output are often compared 
by calculating their process capability indices. The process capability index for a pro- 
cess with upper and lower specification limits U and L, respectively, is 


where the variable X is distributed N(y, o). 
What can be said about the process under each of the following conditions? 
(a) C, = 1. 
(b) C, <1. 
{c) Cc, > 1, 


Upper and lower warning limits are often established for measurements on manufac- 
tured products. Usually, if X ~ N(y,o), these are set at 4 + 1.960 so that 5% of the 
product is outside the warning limits. Discuss the proportion of the product outside the 
warning limits if the mean of the process increases by one standard deviation. 


Suppose that X ~ N(w,o). Find yw and o if P(X > 2) = ; and P(X > 3) = ;: 


“40 lb” bags of cement have weights that are actually N(39.1, 9.4). 
(a) Find the probability that two of five randomly selected bags weigh less than 40 lbs. 


(b) How many bags must be purchased so that the probability that at least 1/2 of the 
bags weigh at most 40 lb is at least 0.95? 


Suppose X ~ N(0, 1). Find 

(a) P(|X| < 1.5). 

(b) P(X? > 1). 

Signals that are either 0’s or 1’s are sent in a noisy communication circuit. The signal 
received is the signal sent plus a random variable, e, that is, N (0. 2 i If a 0 is sent, the 


receiver will record a 0 when the signal received is at most a value, v; otherwise a | is 
recorded. Find v if the probability that a 1 is recorded when a 0 is actually sent is 0.90. 


The diameter of a ball bearing is a normally distributed random variable with mean 6 
and standard deviation >: 


(a) What is the probability a randomly selected ball bearing has a diameter between 5 
and 7? 


www.it-ebooks.info 


174 Chapter3 Continuous Random Variables and Probability Distributions 


21. 


22. 


23. 


24. 


25. 


26. 


27. 


(b) If a diameter is between 5 and 7, the bearing can be sold for a profit of $1. If the 
diameter is greater than 7, the bearing may be reworked and sold at a profit of 
$0.50; otherwise, the bearing must be discarded at a loss of $2. Find the expected 
value for the profit. 


Capacitors from a manufacturer are normally distributed with mean 5 pf and standard 
deviation 0.4 pf. An application requires four capacitors between 4.3 pf and 5.9 pf. If 
the manufacturer ships 5 randomly selected capacitors, what is the probability that a 
sufficient number of capacitors will be within specifications? 


The height, X, a college high jumper will clear each time she jumps is a normal random 
variable with mean 6 feet and variance 5.76 in’. 


(a) What is the probability the jumper will clear 6’4”’ on a single jump? 
(b) What is the greatest height jumped with probability 0.95? 


(c) Assuming the jumps are independent, what is the probability that 6’4”” will be 
cleared on exactly three of the next four jumps? 


A Chamber of Commerce advertises that about 16% of the motels in town charge $120 
or more for a room and that the average price of a room is $90. Assuming that room 
rates are approximately normally distributed, what is the variance in the room rates? 


A commuting student has discovered that her commuting time to school is normally 
distributed; she has two possible routes for her trip. The travel time by Route A has 
mean 55 minutes and standard deviation 9 minutes while the travel time by route B 
has mean 60 minutes and standard deviation 3 minutes. If the student has at most 63 
minutes for the trip, which route should she take? 


The diameter of an electric cable is normally distributed with mean 0.8” and standard 
deviation 0.02”. 


(a) What is the probability the diameter will exceed 0.81/’? 

(b) The cable is considered defective if the diameter differs from the mean by more 
than 0.025’. What is the probability a randomly selected cable is defective? 

(c) Suppose now that the manufacturing process can be altered and that the standard 
deviation can be changed while keeping the mean at 0.8. If the criterion in part (b) 
is used, but we want only 10% of the cables to be defective, what value of o must 
be met in the manufacturing process? 


A cathode ray tube for a computer graphics terminal has a fine mesh screen behind 

the viewing surface, which is under tension produced in manufacturing. The tension 

readings follow an N(275, 40) distribution, where measurements are in units of mV. 

(a) The minimum acceptable tension is 200 mV. What proportion of tubes exceed this 
limit? 

(b) Tension above 375 mV will tear the mesh. Of the acceptable screens, what propor- 
tion have tensions at most 375 mV? 

(c) Refer to part (a). Suppose it is desired to have 99.5% acceptable screens, and that a 
new quality control manager thinks he can reduce o” to an acceptable level. What 
value of o7 must be attained? 

The life lengths of two electronic devices, at D, and D,, have distributions N(40, 6) 

and N(45, 3), respectively. If the device is to be used for a 48-hour period, which device 

should be selected? 
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28. “One pound” packages of cheese are marketed by a major manufacturer, but the actual 
weight in pounds of a randomly selected package is a normally distributed random 
variable with standard deviation 0.02 Ib. The packaging machine has a setting allowing 
the mean value to be varied. 


(a) Federal regulations allow for a maximum of 5% short weights (weights below the 
claim on the label). What should the setting on the machine be? 

(b) A package labeled “one pound” sells for $1.50, but costs only $1 to produce. If 
short weight packages are not sold and if the machine’s mean setting is that in part 
(a), what is the expected profit on 1000 packages of cheese? 


3.6 NORMAL APPROXIMATION TO THE BINOMIAL 
DISTRIBUTION 


Example 3.6.1 


A component used in the construction of an electric motor is produced in a factory assem- 
bly line. In the past, about 10% of the components have proven unsatisfactory for use in the 
motor. The situation may then be modeled by a binomial process in which p, denoting the 
probability of an unsatisfactory component, is 0.10. The assembly line produces 500 com- 
ponents per day. If X denotes the number of unsatisfactory components, then the probability 
distribution function is 


P(X =x)= ea (0.10)*(0.90)°°-*, x = 0,1, ... ,500. 


A graph of the distribution is shown in Figure 3.11. 

Figure 3.11 is centered on the mean value, 500 - (0.10) = 50. Note that the possible 
values of X are from X = 0 to X = 500 but that the probabilities decrease rapidly, so we 
show only a small portion of the curve. 

The graph in Figure 3.11 certainly appears to be normal. We note, however, that, 
although the eye may see a normal curve, there are in reality no points on the graph between 
the X values that are integers, since X can only be an integer. In Figure 3.12, we have used 


0.06 


0.05 


0.04 


~ 0.03 


30 35 40 45 50 55 60 65 70 
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Figure 3.11 Binomial distribution, n = 500, p = 0.10. 
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Figure 3.12 A histogram for the binomial distribution, n = 500, p = 0.10. 


-~ 0.03 


fo) 


30 35 40 45 50 55 60 65 
x 


Figure 3.13 Normal curve approximation for the binomial, n = 500, p = 0.10. 


the heights of the binomial curve in Figure 3.11 to produce a histogram. If we consider a 
particular value of X, say X = 53, notice that the base of the bar runs from 52.5 to 53.5 
(both impossible values for X!) and that the height of the bar is P(X = 53). Thus, the area 
of the bar at X = 53, since the base is of length 1, is P(X = 53). This is the key that allows 
us to estimate binomial probabilities by the normal curve. 

Figure 3.13 shows a normal curve imposed on the histogram of Figure 3.12. What 
normal curve should be used? It is natural to use a normal curve with mean and variance 
equal to the mean and variance of the binomial distribution which is being estimated, so we 
have used N(500 - 0.10, «1/500 - (0.10) - (0.90)) = N(50, 45). 

To estimate P(X = 53), we find P(52.5 < X <53.5) using the approximation 
N(50, 45); this gives 0.0537716. The exact probability is 0.0524484. 

As a final example, consider the probability the assembly line produces between 36 
and 42 unsatisfactory components. This is estimated by P(35.5 < X < 42.5) where X ~ 
N(50, 45). This is 0.116449. The exact probability is 0.118181. 

When the sum of a large number of binomial probabilities is needed, a computer alge- 
bra system might be used to calculate the result exactly, although the computation might 
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well be lengthy. The same computer algebra system would also, more quickly and easily, 
calculate the relevant normal probabilities. In any event, whether the approximation is used 
or not, the approximation of the binomial distribution by the normal distribution is a striking 
fact. We will justify the approximation more thoroughly when we consider sums of random 
variables in Chapter 4. We note now that the approximation works well for moderate or large 
values of n, the quality of the approximation depending somewhat on the value of p. 

In using a normal approximation to a binomial distribution, it is well to check the tail 
probabilities, hoping that these are small so that the approximation is an appropriate one. 
For example, if we want to approximate P(9 < X < 31), we should check that the z-score 
for 31.5 exceeds 2.50 and that the z-score for 8.5 is less than —2.50 since these scores have 
about 2% of the curve in each tail. 


EXERCISES 3.6 


In solving these problems, find both the exact answer using a binomial distribution and the 
result given by the normal approximation. 


1. A loaded coin comes up heads with probability 0.6. In 50 tosses, find the probability 
of between 28 and 32 heads. 

2. Given X a binomial random variable with n = 200 and p = 0.4. Find P(X = 80). 

3. In 100 tosses of a fair coin, show that 50 heads and 50 tails is the most probable out- 
come, but that this event has probability of only about 0.08. Show that this compares 
favorably to the occurrence of at least 58 heads. 

4. A manufacturer of components for electric motors has found that about 10% of the 
production will not meet customer specifications. Find the probability that in a lot of 
500 components, 

(a) exactly 53 do not meet customer specifications. 
(b) between 36 and 42 (inclusive) components do not meet customer specifications. 

5. A system of 50 components functions if at least 90% of the components function prop- 
erly. 

(a) Find the probability the system operates if the probability a component operates 
properly is 0.85. 

(b) Suppose now that the probability a component operates properly is p. Find p if the 
probability the system operates properly is 0.95. 


6. An acceptance sampling plan accepts a lot if at most 3% of a sample randomly chosen 
from a very large lot of items does not meet customer specifications. In the past, 2% of 
the items do not meet customer specifications. Find the probability the lot is accepted 
if the sample size is 
(a) 10 
(b) 100 
(c) 1000. 

7. A fair coin is tossed 1000 times. Let X denote the number of heads that occur. Find k 
so that P(SO0 —k < X < 500+k) = 0.90. 

8. An airline finds that, for a certain flight, 3% of the ticketed passengers do not appear for 
the flight. The plane holds 125 people. How many tickets should be sold if the airline 
wants to carry all the passengers who show up with probability 0.99? 
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9. Sam and Joe operate competing minibuses for travel from a central point in a city to 
the airport. Passengers appear and are equally likely to choose either minibus. During 
a given time period, 40 passengers appear. How many seats should each minibus have 
if Sam and Joe each want to accommodate all the passengers who show up for their 
minibus with probability 0.95? 
10. A candidate in an election knows that 52% of the voters will vote for her. What is the 
probability that, out of 200 voters, she receives at least 50% of the votes? 


11. A fair die is rolled 1200 times. Find the probability that at least 210 sixes appear. 

12. In 10,000 tosses of a coin, 5150 heads appear. Is the coin loaded? 

13. The length of life of a fluorescent fixture has an exponential distribution with expected 
life length 10,000 hours. Seventy of these bulbs operate in a factory. Find the probability 
that at most 40 of them last at least 8000 hours. 

14. Suppose that X is uniformly distributed on [0,10]. 

(a) Find P(X > 7). 

(b) Among 4 randomly chosen observations of X, what is the probability that at least 
2 of these are greater than 7? 

(c) What is the probability that, of 1000 observations, at least 730 are greater than 7? 

15. Two percent of the production of an industrial process is not acceptable for sale. Sup- 


pose the company produces 1000 items a day. What is the probability a day’s production 
contains between 1.4% and 2.2% nonacceptable items? 


3.7 GAMMA AND CHI-SQUARED DISTRIBUTIONS 


In Section 3.3 of this chapter, we considered the waiting time until the first Poisson event 
occurred and found that the waiting time followed an exponential distribution. We now 
want to consider the waiting time for the second Poisson event. 

To make matters specific, suppose that the Poisson random variable has parameter A 
and that Y is the waiting time for the second event. In y units of time, we expect A - y events. 
Now if Y > y there is at most | event in Ay units of time, so 


1 


AYA. yt 
PY > y) = P(X =0 or py, 
= x! 
It follows that 
eA») 
Fy) = PY sy) =1- ) ———— 
0 x! 
F(y) = 1-74” — Aye™*”, so 
dF 
foy= 9 = He, yr0. 
ly 


A graph of f(y) is shown in Figure 3.14. 
Here, f(y) is an example of a more general distribution, called the gamma distribution. 
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0 0.5 1 LS 2 2.5 3 
y 
Figure 3.14 Waiting time for the second Poisson event. 


Consider now waiting for the rth Poisson event from a Poisson distribution with param- 
eter A, and let Y denote the waiting time. Then at most r — | events must occur in y units 


of time, so 
r-1 


I~ Fo) = Pez y= 
It follows that ; 
ToT a vtl yx rl ay yd 
This sum collapses leaving 
I= masa a0, 


Here, f(y) defines what we call a gamma distribution. The exponential distribution is a 
special case of f(y) when r = 1. 
Since f(y) must be a probability density function, it follows that 


co —Ay yr r-1 
| ee 
0 (r-J)! 
Now, letting x = / - y, it follows that 
| e*x'-!dx =(r—1)! if r is a positive integer. 
0 
This integral is commonly denoted by I(r). So 


TW) = | e*x"|dx =(r—1)! if r is a positive integer. 
0 
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Figure 3.15 A gamma distribution. 


Now consider the expected value: 


meee: r-1 
EY) = [» Ty ay 


—Ay yry,r 
r e ne 
=- ————_— d 
a. r! ad 


fee} 
So, since | e (Ay) dy =r! 
0 


r 


It can also be shown that 


r 
Var(Y) = 2" 


Graphs of f(y) are also easy to produce using a computer algebra system. Figure 3.15 shows 
SQ) forr = 7 and A= 2. 

Again, the normal-like appearance of the graph is striking and so we consider a numer- 
ical example to investigate this phenomenon. It will follow from considerations given in 
Chapter 4 that f(y) does indeed approach a normal distribution. 

Suppose the Poisson process has A = 2 and we consider Y, the waiting time for the 
seventh occurrence. Then 


e72vQ7 6 
fQ) = a y20. 
It follows that 
5 e227 6 
P2Q<Y<5j= 7. er dy = 0.759185. 
2: . 
Using earlier formulas, E(Y) = = 7 and Var(Y) = —, so the normal curve approximation uses 


the normal curve N (3 = at This gives an approximation of 0.743161. The normal curve 
approximates this gamma distribution fairly well here. 
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We return now to the gamma function. We see that if r is a positive integer, [(r) = 


(r — 1)! so the graph of I(r) passes through the points (7, r!). But (7) has values when r is 
not a positive integer. For example, 


T (5) = ey l/2 dy. 
2 0 


2 
Letting y = a in this integral as well as inserting factors of V2z results in 


(5) = v2. vie [” ee ee 


The integral is 1/2 the area under a standard normal curve and so is 1/2. So 


Consequently, the gamma distribution is then often written as 


eA yryr-! 


f= i 


y20. 


A special case of the gamma distribution occurs when A = 1/2 and r = n/2. The distribu- 
tion then takes the form 


fw= a, x>0. 
#0) 


Here X is said to follow a Chi-squared distribution with n degrees of freedom, which we 
denote by v7. The exponent 2 has no particular significance; it is simply part of the notation 
which is in general use. We will discuss this distribution in greater detail in Chapter 4, but, 
since it is a special case of the gamma distribution, and since it has a large variety of practical 
applications, we show an example of its use now. 

First, let us look at some graphs of y? for some specific values of n. These graphs, 
which can be produced by a computer algebra system, are shown in Figure 3.16. 
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Figure 3.16 Some 7? distributions. 
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Again we note the approach to normality as n increases. This fact will be established 
in Chapter 4. 


Example 3.7.1 


A production line produces items that can be classified as “Good,” “Bad,” or “Rework,” 
the latter category indicating items that are not satisfactory on first production but which 
could be subject to further work and sold as good items. The line, in the past, has been 
producing 85%, 5%, and 10% in the three categories, respectively. 800 items are produced 
in | day of which 665 are good, 30 are bad, and 95 need to be reworked. These numbers, 
of course, are not exactly those expected, but the increase in items to be reworked worries 
the plant management. Has the process in fact changed or is the sample simply the result 
of random variation? 

This is another instance of statistical inference since we use the sample to draw a con- 
clusion regarding the population, or universe, from which it is selected. We lack of course 
an obvious random variable to use in this case. We might begin by computing the expected 
numbers in the three categories, which are 680, 40, and 80. Let the observed number in the 
ith category be O; and the expected number in the ith category be E;. It can then be shown, 
although not at all easily, that 


n (O; ~E 
ia ae 


i=l i 


follows a Var distribution where n is the number of categories. 
In this case, we calculate 


a 2 _ 40)2 _ agy2 
po 680) , 60 0) 4 80) 


= 5.643382353. 
5 680 10 30 5.643382353 


The 7 curve is an exponential distribution, 


f= (S)e8 x>0. 


This point is quite far out in the right-hand tail of the oa distribution as Figure 3.17 indicates. 
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Figure 3.17 iG distribution. 
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It is easy to find that Po, > 5.643382353) = 0.0595052. So we are faced with a deci- 
sion: if the process has in fact not changed at all, the value of a will exceed that of our 
sample only about 6% of the time. That is, simple random variation will produce this value 
of vee or an even greater value, about 6% of the time. Since this is fairly small, we would 
probably conclude that the sample is not simply a consequence of random variation and 
that the production process had changed. 


EXERCISES 3.7 


1. A Poisson distribution has parameter 2. 
(a) Find the probability distribution for the waiting time for the third event. 
(b) Find the probability waiting time of six events. 

2. Grades in a statistics course are A, 12; B, 20; and C, 8. Is the professor correct in saying 
that the respective probabilities are A, 15%; B, 60%; and C, 25%? 

3. A book publisher finds that the yearly sales, X, of a textbook (in thousands of books) 
follows a gamma distribution with 4 = 10 andr = 5. 
(a) Find the mean and variance for the yearly sales. 


(b) Find the probability that the number of books sold in | year is between 200 and 
600. 


(c) Sketch the probability density function for X. 
4. Particles are emitted from a radioactive source with three particles expected per minute. 


(a) Find the probability density function for the waiting time for the fourth particle to 
be emitted. 


(b) Find the mean and variance for the waiting time for the fourth particle. 


(c) Find the probability that at least 20 seconds elapse before the fourth particle is 
emitted. 


5. Weekly sales, S, in thousands of dollars, for a small shop follow a gamma distribution 
with A= | andr =2 
(a) Sketch the probability density function for S. 
(b) Find P[S > 2 - E(S)]. 
(c) Find P(S > 1.5|$ > 1). 

6. Yearly snowfall, S, in inches, in Southern Colorado follows a gamma distribution with 
A=2andr=3. 
(a) Find the probability at least 8 inches of snow will fall in a given year. 


(b) If 6 inches of snow have fallen in a given year, what is the probability of at least 
two more inches of snow? 


(c) Find Pu -o <S<pto). 
7. Show, using integration by parts, that (7) = (n — 1)! if is a positive integer. 


8. Show that P (n+ *) = Tota 
2 n+l 
a=) 


—a\ _ (=)! (atk) 
9. Show that ( k ) = TkeDr@ * 
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10. If X is a standard normal variable, then it is known that X2 follows a x distribution. 
Calculate P(X? < 2) in two different ways. 


11. A die is tossed 60 times with the following results: 


Face 1 23 4 5 6 
Observations 8 12 9 8 10 13 


Is the die fair? 
12. Show that E(y7) = n and that Var(y7) = 2n by direct integration. 


13. Phone calls come into a switchboard according to a Poisson process at the rate of 5 
calls per hour. Let Y denote the waiting time for the first call to arrive. 


(a) Find P(Y > y). 
(b) Find the probability density function for Y. 
(c) Find P(Y > 10). 


3.8 WEIBULL DISTRIBUTION 


We considered the reliability of a product and the hazard rate in Section 3.3. We showed 
there that a constant hazard rate produced an exponential time-to-failure law. Now let us 
consider nonconstant hazard rates. A variety of time-to-failure laws is used to produce 
non-constant hazard rates. As an example, we consider a Weibull distribution here since 
it provides such a variable hazard rate. In addition to providing a variable hazard rate, a 
Weibull distribution can be shown to hold when the performance of a system is governed 
by the least reliable of its components, which is not an uncommon occurrence. 

We use the phrase “a Weibull distribution” to point out the fact that the distributions 
about to be described vary widely in appearance and properties and in fact define an entire 
family of related distributions. 

We recall some facts from Section 3.4 first. Recall that if f(#) defines a time-to-failure 
probability distribution, then the reliability function is 


R®=PT>H=1-PT<)=1-F(0. 


The hazard rate is ; 
(2 AO 
Rit) R(t) 


(3.5) 


Now suppose that 
h(t) = ae a>0, B>0,1>0. 


Formula (3.5) indicates that 


R(t 
= 2) = % #-! from which we find 


Ro pe 


R(t) = Pa) since t > 0. 
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0.8} 

a=2,b=1 
0.6 + 
0.4} 

a=1/4,b=1/2 


Figure 3.19 Some reliability functions. 


We also find that 


_ ARO) | 
f0=-F = Fe oe 


@ atg-(5). t>0. 


f(t) describes the Weibull family of probability distributions. Varying a and f# produces 
graphs of different shapes as shown in Figure 3.18. 
The reliability functions, R(t), also differ widely as shown in Figure 3.19. 
The mean and variance of a Weibull distribution are found using the gamma function. 
We find, for a Weibull distribution with parameters a and f, that 


E(T) = p-T (= +1) and 


Var(T) = p? - Ir(2+ 1) - {r(- + 1) } 
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EXERCISES 3.8 


1. 


The lifetime of a part is a Weibull random variable with a = 2 and f = 10 years. 
(a) Sketch the probability density function. 

(b) Find the probability the part lasts between 3 and 7 years. 

(c) Find the probability a 3-year-old part lasts at least 7 years. 


. For the part described in problem 1, 


(a) If the part carries a 15-year warranty, what percentage of the parts are still good at 
the end of the warranty period? 

(b) What should the warranty be if it is desired to have 99% of the parts still good at 
the end of the warranty period? 


. The hazard rate for a generator is 10~* t/hour. 


(a) Find R(d), the reliability function. 
(b) Find the expected length of life for the generator. 
(c) Find the probability the generator lasts at least 150 hours. 


. Acomponent’s life length follows a Weibull distribution with a = 1/3, # = 1/27. 


(a) Plot the probability density function for the life length. 
(b) Determine the hazard rate. 
(c) Find the probability the component lasts at least 2 hours. 


. One component of a satellite has a hazard rate of 10~°?/hour. 


(a) Plot R(d), the reliability function. 
(b) Find the probability the component fails within 100 hours. 


. How many of the components for the satellite in problem 5 must be used if we want 


the probability that at least one lasts at least 100 hours to be 0.99? 


. Find the median of a Weibull distribution with parameters a and f. 


. A Weibull random variable, X, has a = 4 and f# = 30. Compare the exact value of 


P(20 < X < 30) with the normal approximation to that probability. 


. Ithas been noticed that 56% of a heavily used industrial bulb last at most 10,000 hours. 


Assuming that the life lengths of these bulbs follow a Weibull distribution with # = 3, 
what proportion of the bulbs will last at least 15,000 hours? 


CHAPTER REVIEW 


Random variables that can assume any value in an interval or intervals are called continuous 
random variables; they are the subject of this chapter. 


It is clear that the probability distribution function, f(x) = P(X = x), which was of 


primary importance in our work with discrete random variables is of little use when X 
is continuous, since, in that case, PX = x) = 0 for all values of X, so this function car- 
ries no information whatsoever. It is possible, however, to distinguish different contin- 
uous random variables by a probability density function, f(x), which has the following 
properties: 


1. f(y) >0 
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2: [ feoac=i 
ae 
3. / S@dx = P(a< X <b) 


We study several of the most important probability density functions in this chapter. 
The mean and variance of a continuous random variable can be calculated by 


EX) =yp= [teva and 


foe) 


Var(X) = 0” = E(X - p)? = / ce uy - f(x) dx 


provided, of course, that the integrals are convergent. 
It is often useful to use the fact that 


o* = E(X*) -[E(X)* 


when calculating 07. 
The first distribution considered was the uniform distribution defined by 


oS. a<x<b. 
b-a 


We found that P 
= ee and that o* = vee) : 
2 12 


The most general form of the exponential distribution is 
fe) = Ae 4", x >a where A> 0. 
A computer algebra program or direct integration shows that 


Bu = | x + f(x) dx=a+- and that 


1 


V(X) = =. 


187 


An interesting fact is that the waiting time for the first occurrence of a Poisson random 


variable is an exponential variable. 


We then discussed reliability since this is an important modern application of proba- 


bility theory. We defined the reliability as 


Rit) = P(T > t) 


where T is arandom variable. The reliability then gives the probability a component whose 


lifetime is the random variable T that lasts at least f units of time. 
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The (instantaneous) hazard rate is the probability, per unit of time, that an item that 
has lasted t units of time will last Af more units of time. We found that the hazard rate, 
H(t), is 
fO 


H(t) = 1- FO’ 


where f(t) and F(f) are the probability density and distribution functions for 7, respectively. 
The normal distribution, without doubt the most important continuous distribution of 
all, was considered next. We showed that its most general form is 


1 aol ae 
f(x) = ———~e 2a OH , ~-O<xX< OM. 
o:-V2-2 


If X has the normal distribution above, we write X ~ N(y, 0). 

An important fact is thatifX ~ N(u, o) andifZ = aay then Z ~ N(0, 1), a distribution 
that is referred to as the standard normal distribution. This fact allows a wide variety of 
normal curve calculations to be carried out using a single normal curve. This is a very 
unusual circumstance in probability theory, distributions often being highly dependent on 
sample size, for example, as we will see in later chapters. 

The normal curve arises in a multitude of places; one of its most important uses is that 
it can be used to approximate a binomial distribution. We discussed the approximation of 
a binomial variable with parameters n and p by a N(np, /npq curve. 

Two distributions whose importance will be highlighted in later chapters are the gamma 
and Chi-squared distributions. The gamma distribution arises when we wait for the rth 
Poisson occurrence. Its probability density function is 


ery! 


f= Gr’ y20, 


where A is the parameter in the Poisson distribution. 
The Chi-squared distribution arises when A = andr =. 
Finally, we considered the Weibull family of distributions whose probability density 
functions are members of the family 


oo = Late li), a>0, B>0, t>0. 


It is fairly easy to show that 
1 
E(T) =p -T(—+1) andl 
a 
5 2 1 : 
Var(T) = f2 - r(=+1)-{r(—+1)} 
a a 


The Weibull distribution is of importance in reliability theory; several examples were given 
in the chapter. 
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PROBLEMS FOR REVIEW 


Exercises 3.1 #2, 3, 4,5, 7, 10, 14, 18 

Exercises 3.2 # 1, 2, 4, 7, 9, 10 

Exercises 3.4 # 1, 2,5, 7,9, 10, 16 

Exercises 3.5 # 1, 3, 5, 8, 9, 10, 15, 16, 17, 19, 23, 26 
Exercises 3.6 # 1, 2, 3, 7, 10, 12 

Exercises 3.7 # 2, 3, 6, 8, 11 

Exercises 3.8 # 1, 3, 4, 6, 7,9 


SUPPLEMENTARY EXERCISES FOR CHAPTER 3 


1. A machining operation produces steel shafts having diameters that are normally dis- 
tributed with mean 1.005 inches and standard deviation 0.01 inches. If specifications 
call for diameters to fall in the interval 1.000+0.02 inches, what percentage of the steel 
shafts will fail to meet specifications? 


2. Electric cable is made by two different manufacturers, each of whom claims that 
the diameters of their cables are normally distributed. The diameters, in inches, 
from Manufacturer I are N(0.80, 0.02) while the diameters from Manufacturer II are 
N(0.78, 0.03). A purchaser needs cable that has diameter less than 0.82 inches. Which 
manufacturer should be used? 

3. A buyer requires a supplier to deliver parts that differ from 1.10 by no more than 0.05 
units. The parts are distributed according to N(1.12, 0.03). What proportion of the parts 
do not meet the buyer’s specifications? 

4. Manufactured parts have lifetimes in hours, X, that are distributed N(1000, 100). If 
800 < X < 1200, the manufacturer makes a profit of $50 per part. If X > 1200, the 
profit per part is $75; otherwise, the manufacturer loses $25 per part. What is the 
expected profit per part? 

5. The annual rainfall (in inches) in a certain region is normally distributed with yw = 
40, o =4. Assuming rainfalls in different years are independent, what is the proba- 
bility that in 2 of the next 4 years the rainfall will exceed 50 inches? 

6. The weights of oranges in a good year are described by a normal distribution with 
H = 16 and o = 2 (ounces). 

(a) What is the probability that a randomly selected orange has weight in excess of 17 
ounces? 

(b) Three oranges are selected at random. What is the probability the weight of exactly 
one of them exceeds 17 ounces? 

(c) How many oranges out of 10000 are expected to have weight between 15.4 and 
17.3 ounces? 

7. A sugar refinery has three processing plants, all receiving raw sugar in bulk. The 
amount of sugar in tons that each of the plants can process in a day has an exponential 
distribution with mean 4. 

(a) Find the probability a given plant processes more than 4 tons in a day. 
(b) Find the probability that at least two of the plants process more than 4 tons in a day. 
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8. 


10. 


11. 


12. 


13. 


14. 


The length of life in hours, X, of an electronic component has an exponential probability 
density function with mean 500 hours. 


(a) Find the probability that a component lasts at least 900 hours. 


(b) Suppose a component has been in operation for 300 hours. What is the probability 
it will last for another 600 hours? 


. Students in an electrical engineering laboratory measure current in a circuit using an 


ammeter. Due to several random factors, the measurement, X, follows the probability 
density function 
f(x) = 0.025 x+b, 2<x<6. 


(a) Show that b = 0.15. 

(b) Find the probability the measurement of the current exceeds 3 amps. 

(c) Find E(X). 

(d) Find the probability that all three laboratory partners measure the current indepen- 
dently as less than 4 amps. 


Let X be a random variable with probability density function 


k 1l<x<2 


f@) = 
kB-x) 2<x<3. 


(a) Find k. 
(b) Calculate E(X). 
(c) Find the cumulative distribution function, F(x). 


The percentage, X, of antiknock additive in a particular gasoline, is a random variable 
with probability density function 


f@=kXe(1-», 0<x< 1. 


(a) Show that k = 20. 

(b) Evaluate P[X < E(X)]. 

(c) Find F(x). 

Suppose that f(x) = 3x*, 0 <x < 1, is the probability density function for some ran- 
dom variable X. Find P (x > : [x = : ). 

A point B is chosen at random on a line segment AC of length 10. A right-angled 


triangle with sides AB and BC is constructed. Determine the probability that the area 
of the triangle is at least 7 square units. 


A random variable, X, has probability density function 


ax 0<x<3 
fx) = 


6a-—ax 3<x<6. 


(a) Show that a = . 
(b) Find P(X > 4). 
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18. 


19. 


20. 


21. 
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Verify Tchebycheff’s inequality for k = V2 for the probability density function 
f@m= sot 1), -l<x<l. 
Suppose X has the distribution function 


0 x<0 
1 4 
F(x) = aie O0<x<2 
1 x> 2. 


(a) Find a. 
(b) Find P(X > 1). 
Find the mean and variance of the random variable X whose probability density func- 
tion is 
f@® = (1 —xa-3) lex. 


A random variable X has probability density function 


l+x -l1<x<0 
ff) = 
l-x O<x<l. 


Find E[X? — 2X + 2]. 

(a) Determine k so that f(x) = kxe isa probability density function for some non- 
negative random variable X. 

(b) Determine F(x) and sketch it. 


The time (in seconds) a car has to wait for a certain traffic light has probability density 


function 
x 


aos <x< 

2500 a 
FQ) = 1 ‘ 

—— <x< 100. 

25. 2500 caeeieeiid 


(a) What is the probability that the waiting time is between 25 and 75 seconds? 


(b) Ifacar has waited 25 seconds, what is the probability it will wait at least 25 seconds 
more? 


One hundred independent observations are made of the random variable X whose prob- 
ability density function is 


O0<x<l 
f@) = 
2-x 1<x<2. 


Find the probability that at least 20 of these observations exceed 1.5. 
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22. X is arandom variable with distribution function 


0 x<—2 
F(@) = — —2<x%<2 
1 x>2 


Identify the probability density function for X. 
23. A random variable X has probability density function 


f(x) = 


Find F(x), being sure to specify this for any value of x. 
24. Find the constant c that makes g(y) = a 1 < y <2, a probability density function. 
25. Find the mean and variance of the random variable X whose probability density func- 
tion is 
l-x O<x<l 
fQ) = 


x-1 1<x<2. 
26. A random variable X has probability density function 
ocr Eee, 
2x e 
Two independent observations are made on X. Find the probability that one observation 


is less than | and that the other observation is greater than 1. 


27. A random variable X has distribution function 


0 x<-2 

- —2<x<-l 
F@w= 1 

5 -l<x<2 

1 x>2 


Find f(x). 


28. The probability density function for X, the lifetime in hours of a certain type of elec- 
tronic device, is given by 


— #10 
fam=4* 
0 x < 10. 
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(a) Find P(X > 20). 
(b) Find F(x). 
29. A random variable T has probability density function 


a(t) =k +0, t>0. 
Find P(T > 2|T > 1). 


30. A player can win a solitaire card game with probability 1/12. Find the probability that 
the player wins at least 10% of 500 games played. 
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Chapter 4 


Functions of Random Variables; 
Generating Functions; Statistical 
Applications 


4.1 INTRODUCTION 


We now want to expand our applications of statistical inference first encountered in 
Chapter 2. In particular we want to consider tests of hypotheses and the construction 
of confidence intervals when continuous random variables are involved; we will also 
introduce simple linear regression. These considerations have direct bearing on problems 
of data analysis such as that encountered in the following situation. 

A production process has been producing bearings with mean diameter 2.60 in.; the 
diameters exhibit some variability around this average value with the standard deviation of 
the diameters believed to be 0.03 in. A quality control inspector chooses a random sample 
of ten bearings and finds their average diameter to be 2.66 in. Has the process changed? 

The quality control inspector here has a single observation, namely 2.66 in., the average 
of ten observations. This is most commonly the situation: only one sample is available; 
decisions must be made on the basis of that single sample. Nonetheless we can speculate on 
what would happen were the sampling to be repeated. In that case, another sample average 
will most likely occur. In order to decide whether 2.66 in. is unusual or not, we must know 
the probability distribution of these sample means so that the variation in the mean from 
sample to sample can be assessed. We can then base a test of the hypothesis that the process 
mean has not changed on that probability distribution. Confidence intervals can similarly 
be constructed, but again, the probability distribution of the sample mean must be known. 

Determination of the probability distribution here is not particularly easy so we first 
need to make some mathematical considerations. This will not only enable us to consider 
the example at hand, but will also allow us to solve many other complex problems arising in 
the analysis of data. We also must investigate the distribution of the sample variance arising 
from samples drawn from a continuous distribution. 

We begin by considering functions of random variables; sums and averages arising 
from samples are examples of complex functions of sample values. Special functions called 
generating functions provide a particularly powerful technique for solving these problems. 
While developing these techniques we will solve many interesting problems in probability. 


Probability: An Introduction with Statistical Applications, Second Edition. John J. Kinney. 
© 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc. 
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Finally we will show several practical statistical problems and their solution, including a 
statistical process control chart. 


4.2 SOME EXAMPLES OF FUNCTIONS OF RANDOM 
VARIABLES 


In Chapter 3, the following problem was considered: an observation, X, was made from a 
uniform distribution on the interval [0, 1] and then a square of side X was formed. What is 
the expected value of its area? 

This problem is fairly easily solved. Since E(X) = ; and 


Var(X) = E(X*) — [E(X)]" = ~ it follows that 


_ 2 2_ 1 2_ 1 1 ere | 
E(Area) = E(X“) = Var(X) + [EQO]* = D + [E(X)]|° = D + ras 

Other problems of a similar nature, however, may not be quite so easy. As another 
example, suppose X is an exponential random variable with mean a@ and we seek E( /X). 
It would be unreasonable to think, for example, that E( Vx ) = VE(X). This expectation is, 
after all, an integral and integrals rarely behave in such simple manner. 

The reader is invited to calculate, or use a computer algebra system, in this example to 
find that 

E(VX) = | Ve Lina. 
9 @ 2 

Another frequently used technique for evaluating the integral encountered earlier is 
to select a random sample of values from an exponential distribution and then calculate 
the average value of their square roots. This technique, widely used in problems that prove 
difficult for analytical techniques, is known as simulation. A computer program chose 1000 
observations from an exponential distribution with a = 4 and then calculated the mean of 
the square roots of these values. The observed value was 1.800, while the expected value is 
Va = 1.7725, so the simulation produced a value quite close to the expected value. 

Expectations of many functions of random variables can be carried out by using the 
probability density function of X directly. In the first example (where X is uniformly dis- 
tributed on [0, 1] and denotes the length of the side of a square), suppose we wanted the 
probability that the area of the square was between 1/2 and 3/4. We can calculate 


p(i<x<2)=P Dye _¥3_4 = 0.15892, 
! 4 2 2 2 


using the distribution of X directly. 
In the second example, supposing X is a random observation from an exponential dis- 
tribution with mean a, we calculate, for example, 


P< VX<2)=PU<X <4) 


4 


4 ak 1 ; 
-/ (l/a):-e «-dx=ea-e 12, 
1 
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so often probabilities involving functions of random variables can be found from the dis- 
tributions of the random variables themselves. 

Now suppose that we have two independent observations of a random variable X and 
we consider the sum of these, X, + X, This is certainly a random variable. How can we 
calculate P(X, + X, < 2) if X, and_X, are, for example, independent observations from the 
exponential distribution? Clearly, this problem is not as simple as the preceding ones. 

It is fortunate that there is another way to look at these problems. It turns out that this 
other view will solve these problems and has, in addition, considerable implications for 
the solutions of much more complex problems, solutions that are not easily found in any 
other way. Our approach will also explain why normality has occurred so frequently in our 
problems; the reason for this is not simple, as the reader might expect. 

The expressions X?, Vx , and X, + X, are functions of the random variable X. Since X 
is a random variable, so too are these functions of X; then they have probability distribu- 
tions. If these probability distributions could be determined, then the earlier problems, as 
well as many others, could be solved, so we now consider one method for determining the 
probability distribution of a function of the random variable X. 


4.3 PROBABILITY DISTRIBUTIONS OF FUNCTIONS 
OF RANDOM VARIABLES 


We begin with an example discussed in Chapter 3. 


Example 4.3.1 


Suppose that a random variable X has a standard normal distribution, that is, X ~ N(O, 1). 
Consider the random variable Y = X2, so that Y isa quadratic function of the random vari- 
able X. What is the probability density function for Y? 

Our answer depends on the simple fact that when the derivative exists, and where f(x) 
and F(x) denote the probability density function and distribution function respectively, then 


dF(x) _ 
ai = f(). 


Let g(y) and G(y) denote the probability density function and the distribution func- 
tion, respectively, for the random variable Y. Our basic strategy is to find G(y) and then to 
differentiate it, using the property above, to produce g(y). 

Here 


GQ) = PY sy) = PX’ sy) =P-yy sX< yy), 80 
GQ) = F(/¥) - Fy), 


by a property of distribution functions. Now we differentiate throughout to find that 


_ dG) _ dF(y/y) - dF(-9) 


gy) en Bi 
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dF(4/y) ae 
Now we must be careful because Ay # f( /y). The problem lies in the fact that the 


variables in the numerator and in the denominator are not the same. However the chain rule 
comes to our rescue and we find that 


gy) = dh dey 7 
This becomes 
gy) = oS ~~ (4.1) 
But 
fey ee, Gnd 


V2n 


1 .=% 
f/y) =f vy) = e 2, y>0, so 
vy vy = 
g(y) = : e72, y>0. 


\/ 2ny 


This is the x variable, first seen in Section 3.7. The domain of values for Y is estab- 
lished from that for X: since —oo < X < oo, then —oo < 4/y < co or 4/y < oo, soy> 0. 
The same domain is correct for —/¥ above. 

The calculation that i g(y) dy = 1, and the fact that g(y) > 0, checks our work and 
shows that g(y) is a probability density function. 

This process works well when the derivatives involved can be evaluated and that is 
often the case in the instances that interest us here. 

In the previous example, Y is a quadratic function of X and the resulting distribution 
for Y bears little resemblance to that for X. We expect that a linear function would preserve 
the shape of the distribution in a sense. We consider a specific example first. 


Example 4.3.2 


Suppose X is uniform on [3, 5] so that f(x) = 7 3<x<5.LetY= =. a linear function 
of X. Again we find the distribution function and differentiate it. Here 


Go) = PY sy) = P(*=* <y) = P(X < 3y +2) = Fy +2). 


Then 


_ dG(y) _ dF(3y + 2) _ dF(3y + 2) d(3y + 2) 


BO) = dy d@y+2) dy 


ay) = fy + 2)-3 = . 
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To establish the domain for y, note that f(3y + 2) = : if 3 < 3y +2 <5 which simpli- 
fies to ; < y < 1, producing the final result: 


a! 
ay) = 5? 


1 
This is a probability density function since g(y) > 0 and i g(y) dy=1. 


We observe that the linear transformation Y = — of X preserves the uniform distri- 
bution with which we began. 

To consider the problem of a linear transformation in general, suppose that X has prob- 
ability density function f(x) and that Y = aX + b for some constants a and b provided that 
a #0. Then 


Gy)= PY <))=Pux+b<y)=P(x<*—*) -F (=) co 


showing that the shape of the distribution is preserved under the linear transformation. 

If the reader has not already done so, please note that it is crucial that the variables, 
denoted by capital letters, must be clearly distinguished from their values, denoted by small 
letters; otherwise, confusion, and most likely errors, will abound. 


Example 4.3.3 


Consider, one more time, the fair wheel where f(x) = 1, for 0 < x < 1. Now let us spin the 
wheel, say n times, obtaining the random observations X,, X>, ...,X,. We let Y denote the 
largest of these, so that 

Y = max{X,, X,...,X,}. 


Y is clearly a nontrivial random variable. Again we seek the probability density func- 
tion for Y, g(y). 

Note that if the maximum of the X’s is at most y, then each of the X's must be at most 
y. So 


G(y) =P(Y <y) = P(max{X, Xp, Ky} < y) 
= P(X, <y and X, <y and --- and X, <y). 


Since the X’s are independent, it follows that 


Gy) = [PO& < y)] - [PQ < y)] +--+ (P&, sy). 
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Now, since all the X’s have the same probability density function, 
Gy) = [PX < y)]" = [FO))". 
It follows that 
dG(y) _ 


‘ia 1 
aye n[F(y)" +f): 


a0) = 

Since the X’s all have the same uniform probability density function, in this case 
F(y) = y so that g(y) = ny""!, forO< y <1. 

In the general case, we note that the distribution for Y is dependent on F(y). F(y) is 
easy in this example, but it could prove intractable, as in the case of a normal variable which 
has no closed form for its distribution function. In fact, the probability distribution of the 
maximum observation from a random sample of observations from a normal distribution is 
unknown. 


Expectation of a Function of X 


In Chapters 1 and 2, we calculated expectations of functions of X using only the probability 
density function for X. Specifically, we let 


E[H(X)] -{ A(x) - f(x) dx, (4.2) 


where H(X) is some function of the random variable X and f(x) is the probability density 
function for the random variable X. We took this as a matter of definition. For example, we 
wrote that 


E(X’) = / = - f(x) dx. 


The reader may now well wonder a bit about this definition. The function H(X) is 
also a random variable. To find its expectation, should not we find its probability density 
function first and then the expectation of the random variable using that probability density 
function? It would appear to be a strategy that is certain of success. Amazingly, it turns out 
not to be necessary, and formula ((4.2)) gives the correct result. Let’s see why this is so. 

To make matters simple, suppose Y = H(X) and that H(X) is a strictly increasing func- 
tion of X. (A demonstration similar to that given here can be given for H(X) strictly decreas- 
ing.) Then 

Gy) = PY < y) = PIH(X) sy] = PIX S$ H™'0)], 


since H(X) is invertible. This means that 


G(y) = F[H7!(y)]_ or 


=| 
gi) = GEE shat 
dy 
dH 
20) =/{H7'()] a. 


gy) =f(x)- “. 
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where x is expressed in terms of y. This formula can indeed be used in many of our change 


of variable formulas but the reader is warned that the function must be strictly increasing 
or strictly decreasing for this result to work. Now we calculate the expected value: 


a= f ye) w= [ Hx) fo) Fdy= | H(x) - f(x) dx, 


showing that our definition of the expectation of a function of X was sound. 


Example 4.3.4 


Consider the probability density function 
f(xy =k? <x <2. 


We calculate E(X7) in two ways: first, by finding the probability density function for 
X?, and second, without finding that probability density function. 
k must be determined first. Since 


2 
| k-x* dx=1 it follows that 
0 


3 
bee A so 
3 3 


Now consider the transformation Y = X?. 


GQ) = PY < y) = PQ <y) = P(X < Vy) = Flv), 


since X takes on only non-negative values. 


Now 8) = pS). 80 


Then 


Now we use ((4.2)) directly: 


obtaining the previous result. 
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EXERCISES 4.3 


1. 


12. 


13. 


Suppose that X is uniformly distributed on the interval (2,5). Let Y = 3X — 2. Find g(y), 
the probability density function for Y. 


. Suppose that the probability density function for a random variable X is f(x) = 4+ e~*, 


x>0, A>0. Let Y=3-X. Find g(y), the probability distribution function for 
Y. 


. Let X have a uniform distribution on (0,1). Find the probability density function for 


Y = X? and prove that the result is a probability density function. 


. The random variable X has the probability density function f(x) = 2x, O<x< 1. 


(a) Let Y = X? and find the probability density function for Y. 

(b) Now suppose that X has the probability density function f(x). What transformation, 
Y = A(X), will result in Y having a uniform distribution? (Part (a) of this problem 
may help in discovering the answer.) 


. Suppose that X ~ N(u, 0), and let Y = ee, 


(a) Find the mean and variance of Y. 


(b) Find the probability density function for Y. The result is called the Jognormal prob- 
ability density function since the logarithm of the variable is N(y, o). 


. Random variable X has probability density function f(x) = 4x(1 — x7), O<x< 1. 


Find E(X”) in two ways: 
(a) Without finding the probability density function of Y. 
(b) Using the probability density function of Y. 


. If X has a Weibull distribution with parameters a and f, show that the variable Y = 


xX : : : : 
(=) is an exponential variable with mean 1. 


. The folded normal distribution is the distribution of |X| where X ~ N(p, 0). 


(a) Find the probability density function for a folded normal variable. 
(b) Find £(|X]). 


. Find the probability density function for Y = X* where X has an exponential distribu- 


tion with mean value 1. 


. A circle is drawn by choosing a radius from the uniform distribution on the interval 


(0, 1). Find the probability density function for the area of the circle. 


. Suppose that X is a uniform random variable on the interval (—1, 1). Find the probability 


density function for the variable Y = sin(X). 
Find the probability density function for Y = e* where X is uniformly distributed on 
[0, 1]. 


Random variable X has probability density function 


1 


FO) = a’ x>0 


(a) Find the probability density function for Y = x : 


(b) Show that POO< Y<b)=1- —s, where b > 0. 


www.it-ebooks.info 


202 Chapter4 Functions of Random Variables; Generating Functions; Statistical Applications 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


21. 
22. 


23. 


24. 


A random variable X has the probability density function 
fo = E, alexi 


Find E(X?), 
(a) by first finding the probability density function for Y = X?. 
(b) without using the probability density function for Y = X?. 


A fluctuating electric current, J, is a uniformly distributed random variable on the inter- 
val [9, 11]. If this current flows through a 2-ohm resistor, the power is Y = 2/7. Find 
E(Y) by first finding the probability density function for the power. 


1 


A random variable X has the probability density function f(x) = az * > |. Find the 


probability density function for Y = 1 — x and prove that your result is a probability 
density function. 

Independent observations X,, X>, X3,...,X,, are taken from the exponential distribu- 
tion f(x) = Ae~* where x > 0 and A> 0. Find the probability density function for 
Y = min(X,, Xp, X3, ...,X,,). 

In triangle ABC, angle ABC is 2/2, |AB| = 1 and angle BAC (in radians) is a random 
variable, uniformly distributed on the interval [0, 7/3]. Find the expected length of 
side BC. 


Find the probability density function for Y = X? if X has the probability density func- 
tion 1 
Gig (PSRs 
f@) = i 
x 
——p-, 0<7 <2. 
472 . 
Find the probability density function for Y = —InX if X is uniformly distributed on 
(0, 1). 
1 1. ‘ 
-)j=-=- — =e 9 
IsE (5) am ff@ =e, x20 


Find g(y), the probability density function for Y = X? if X is uniformly distributed on 
(-1, 2). 

Computers commonly produce random numbers that are uniform on the interval (0, 1). 
These can often be used to simulate random selections from other probability distribu- 
tions in the following way. Suppose we wish a function of the uniform variables to have 
a given probability distribution function, say g(y). Then, if G(y) is invertible, consider 
the transformation Y = G~!(X). Then, 


PY < y) = PIG™'(X) < y] = PIX < GQ)] = GO) 
since X is a uniform random variable, showing that Y has the required probability den- 


sity function. 


(a) Find a function, Y, of a uniform (0, 1) random variable so that Y is uniform on 
(a, b). 

(b) Finda function, Y, of a uniform (0, 1) random variable so that Y has an exponential 
distribution with expected value 1/4. 


Show that E[H(X)] = Te. A(x) - f(x): IS dy, where f(x) is the probability density 
function of X, if H(X) is a strictly decreasing function of X. 


www.it-ebooks.info 


4.4 Sums of Random VariablesI 203 


25. Show, without using the probability density function of Vx , that E(VX) = 5 Vax if 
X is an exponential random variable with mean a. [Hint: The variance of a N(0, 1) 
variable is 1.] 


i 1 1 ‘ 1 1 
26. Show that if 1al= =o a <x<oo and if ae then £0) = —* 


1 
i+?’ —o <y<o. 
27. An area is lighted by lamps whose length of life is exponential with mean 8000 hours. 
It is very important that some light be available in the area for 20,000 hours. How many 


lamps should be installed? 
28. Random variable X has a Cauchy distribution, that is 


1 
FS = mise) =00 < X < OO. 


1 
Let Y= Tex2- 


(a) Show that the probability density function of Y is 
a 
avy — y) 


(b) Show that the distribution function for Y is 


gy) = —0 <y<o. 


FQ) = 2 arcsin(1/¥), O0<y<l. 
1 


(c) Find E(Y) and Var(Y). 
1 


29. f(x) = 1/3,3 <x < 6. Find the probability density function for Y = - - o* : 


4.4 SUMS OF RANDOM VARIABLES I 


Random variables can often be regarded as sums of other random variables. For example, 
if a coin is tossed and X, the number of heads that appear is recorded (X can only be 0 or 1), 
and subsequently the coin is tossed again and Y, the number of heads that appear is recorded, 
then clearly X + Y denotes the total number of heads that appear. So the total number of 
heads when two coins are tossed can be regarded as a sum of two individual (and, in this 
case, independent) random variables. Clearly, we expect X + Y to be a binomial random 
variable with n = 2. We can extend this argument to 7 tosses; the sum is then a binomial 
random variable. In Chapter 1, we encountered the random variable denoting the sum when 
two dice are thrown, so we have actually considered sums before. 

Now we intend to study the behavior of sums of random variables primarily because 
the results are interesting and because the consequences have extensive implications to 
some problems in statistics. In this section, we start with some interesting results and 
examples. 


Example 4.4.1 


In the first example above, X is a random variable that takes on the values | or 0 
with probabilities p and 1 — p, respectively. Y has a similar distribution, and since X 
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and Y are independent, the distribution of X + Y can be found by considering all the 
possibilities: 


P(X +Y =0)=P(X = 0): P(Y = 0) =(1-p)’ 

P(X+Y = 1)=P(X = 0): P(Y = 1)+ P(X = 1)- P(Y = 0) 
= 2p(1 — p) 

P(X + Y =2)=P(X =1)-P(Y = 1) =p”. 


Recall that the individual variables X and Y are often called Bernoulli random variables; 
their sum, as the earlier calculation shows, is a binomial random variable with n = 2. Since 
the Bernoulli random variable can be regarded as a binomial variable with n = 1, we see 
that in this case the sum of two independent binomial variables is also binomial. This raises 
the question, “Is the sum of two independent binomial random variables in general always 
binomial?” 

To answer this question, let us proceed with a calculation. Suppose X is binomial (n, p) 
and Y is binomial (m, p) and let Z = X + Y. The event Z = z can arise in several mutually 
exclusive ways: X =O and Y=z; X = 1 and Y=z-—1, and soon, until X = zand Y= 0. 
So, assuming also that X and Y are independent, 


PZ == YK =h) PY =z-b,, or 
k=0 


PZ=2)=)) (i) PM — py" (. “ > Sia 


k=0 


This can be simplified to 


P(Z =z) = p(1 - pyr)" (i) (, a) 


k=0 


But we recognize }’,_ (z) (.",) from the hypergeometric distribution as ("t”) . So 


PZ=2n= 4 : ) p(l—py""*,  z=0,1,2,...,.n +m, 


a binomial distribution with parameters n + m and p. This establishes the fact that sums of 
independent binomial random variables are also binomial. 

We note here, since E(X) = np andVar(X) = np(1 — p) (with similar results for Y), that 
E(Z) = (n+ m)p and Var(Z) = (n +m) p(1 — p). We summarize these results as follows: 


E(X+Y)=E(X)+E(Y) and 
Var(X + Y) = Var(X) + Var(Y), 


since X and Y are independent. As we will see later, the assumption of independence is a 
crucial one here. 

We note here that the sum of independent binomial random variables is again binomial. 
Occasionally, random variables are reproductive in the sense that their sums are distributed 
in the same way as the summands, but this is not always the case. In fact, it is not the case 
with binomials if the probability of success at any trial for the random variable X differs 
from the probability of success at any trial for the random variable Y. We turn now to 
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an example where the probability distribution of the sum is not of the same form as the 
summands. 


Example 4.4.2 


Suppose X and Y are each discrete uniform random variables, that is, 
PX =x)= us K=O: 152..5.7 
n 


with an identical distribution for Y. What happens if we add two randomly chosen obser- 
vations? We investigate the probability distribution of the sum, Z = X + Y, assuming that 
X and Y are independent. 

The special case n = 4 may be instructive. Then if we wanted to find, for example, 
P(Z = 6) we could work out all the possibilities: 


P(Z = 6) = P(X = 2): P(Y = 4) + P(X = 3): P(Y = 3) + P(X = 4): P(Y = 2) 


=(3) la) + GQ) (a) +G) Ge 


Proceeding in a similar way for other values of z, we find 


1 

7 =9 

16° * 

2 

— = 

ie 

3 

ey =4 

ie’ * 

4 
P(Z=z=4-—, =5. 
eee) ig" * 

3 

—. =6 

ie * 

2 

aay = 

ie’ * 

1 

a =8 

16° * 

This result can also be summarized as 
ct 2=23,45 
PZ = 2) = 4 
—Z 
—, = 6,7,8 
io? * 


A graph of this is shown in Figure 4.1. 

The sum is certainly not uniform. It is not clear what might happen when we increase 
the number of summands. We might conjecture that, as we add more independent uniform 
random variables, the sums become normal. This is in fact the case, but we need to develop 
some techniques before we can consider that circumstance and verify the normality. We 
will begin to do that in the next section. 
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Probability 


0.25 
0.225 
0.2 
0.175 
0.15 


° 
on 
—- €6Ol 


0.075 


2 3 4 5 6 7 8 
Sum 


Figure 4.1 Sum of two independent discrete uniform variables, n = 4. 


EXERCISES 4.4 


. Verify all the probabilities in Example 4.4.1 
2. Random variable X has the probability density function given in the following table 
x 1 2 3 4 
1 1 1 1 
Wa 6 4 a 


(a) Find the probability density function for two independent observations X, and X5. 
(b) Verify that E[X, + X,] = E[X,]+ ELX,] and that Var[X, + X,] = Var[X,]+ 
Var[X)]. 


. Show that the sum of two independent Poisson variables with parameters A,and A, 


respectively, has a Poisson distribution with parameter 1, + A,. 


. Let X and Y be independent geometric random variables so that PX = x) = (1 — py: 


p,x = 1,2,3, ... with a similar distribution for Y. Show, if X and Y are independent, 
that X + Y has a negative binomial distribution. 


. Find the probability distribution for X + Y + Z where X, Y, and Z each have a discrete 


uniform distribution on the integers 1, 2, 3, and 4. 


. Let X denote a Bernoulli random variable, that is, PX = 1) = p and P(X = 0) =1-p 


and let Y be a binomial random variable with parameters n and p. Show that X + Y is 
binomial with parameters n + | and p. 


. A coin, loaded so as to come up heads with probability 2/3, is thrown until a head 


appears, then a fair coin is thrown until a head appears. 
(a) Find the probability distribution for Z, the total number of tosses necessary. 
(b) Find the expected value for Z from the probability distribution for Z. 


. Phone calls come into an office according to a Poisson distribution with four calls 


expected in an interval of 2 minutes. The calls are answered according to a binomial 
process with p = 1/2. Find the probability that exactly three calls are answered in a 
two-minute period. 


. Generalize problem 6: Consider Poisson events, in a given interval of time, with param- 


eter A, which are recorded according to a binomial process with parameter p. Show that 
the number of events recorded in the interval of time is Poisson with parameter Ap. 
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At this point we have found the probability distribution functions of sums of random vari- 
ables by working out the probabilities for each possible value of the sum. This technique 
of course cannot be carried out when the summands are continuous or when the number of 
summands is large. 

We consider now another technique that will make some complex problems involving 
sums of either discrete or continuous random variables tractable. We start with the discrete 
case in this section. 


Example 4.5.1 


Consider throwing a fair die once, and this function: 
1 
G(t) = gut PEP er ePeh, 


If X is the random variable denoting the face showing on the die, then we observe 
that the coefficient of r* in G(t) is the probability that X equals k, P(X = k). For example, 
P(X = 3) is the coefficient of #? which is 1/6. Since G(t) has this property, it is called a 
probability generating function. 

If X is a random variable taking values on the nonnegative integers, then any function 
of the form 


ye -P(X =k) 
k=0 


is called a probability generating function. 

Note that in G(f) we could easily load the die by altering the coefficients of the pow- 
ers of ¢ to reflect the different probabilities with which the faces appear. For example the 
function ; ; i i ; , 

= 2 3 4 5 6 
A(t)= 10° + 3! i 10. +5 aot ge 
generates probabilities on a die loaded so that faces numbered | and 3 appear with proba- 
bility 1/10 while each of the other faces appears with probability 1/5. 

Probability generating functions are of great importance in probability; they provide 
neat summaries of probability distributions and have other remarkable properties as we 
will see. 

Continuing our example, if we square G(f) we see that 


C(t) = =r 43P 430 +4P +50 +60 +52 4 4e 432 427" +2”). 
G’(1) is also a probability generating function—its coefficients are the probabilities of 


the sums when two dice are thrown. 
In general, G"(t) generates probabilities 


P(X, +X> +X3+: +X, =k), 


where X; is the face showing on die i,i = 1, 2,..., 7. 
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We may use this fact to find, for example, the probability that when four fair dice are 
thrown a sum of 17 is obtained. We would find this to be a very difficult problem if we 
were constrained to write out all the possibilities for which the sum is 17. Since G(t) can 
be written as 


t(1 — f°) 
G(t) = , 
© 6(1 — 4) 
it follows that 4 6a 
(1 — t°) 
G* t) = —————__. 
@ 6*(1 — 14 


This reduces our problem to finding the coefficient of t!> in (1 — #°)*(1 — 1)~+. Expand- 
ing this by the binomial theorem and ignoring the division by 6+ for the moment, we see 
that the coefficient we seek is 


! = (‘) Ps a) fe4.. | ! + a) (—1) + (3) (-1)? +-- | 
So the coefficient of t!> is 
Calor) G) Go 
a (?) -4 (") +6 co) = 104. 
Therefore the probability we seek is 104/6* = 13/162. 
This process is certainly an improvement on that of counting all the possibilities, a 


technique that clearly becomes impossible when the number of dice is large. 
A computer algebra system allows us to find 


4 
Gi(t) = Fg Paper er ef) 


giving directly the following table of probabilities for the sums on four fair dice: 


Sum Probability Sum Probability 
4 1/1296 15 35/324 
5 1/324 16 125/1296 
6 5/648 17 13/162 
7 5/324 18 5/81 

8 35/1296 19 7/162 
9 7/162 20 35/1296 
10 5/81 21 5/324 
11 13/162 22 5/648 
12 125/1296 23 1/324 
13 35/324 24 1/1296 
14 73/648 
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While the computer can give us high powers of G(), it can also give us great insight 
into the problem as well. Consider a graph of the coefficients of G(7) as shown in Figure 4.2. 


0.25 


0.2 


0.15; 


Probability 


0.1 


0 1 2 3 4 5 6 
Point 


Figure 4.2 Probabilities for one fair die. 


Now consider G?(t) whose coefficients are shown in Figure 4.3. 


OO oS 
Sa ke. Ee 
yo AK 


Probability 
° 


0.08 | 
0.06 + 
0.04 ¢ 


Sum 


Figure 4.3 Probabilities for sums on two fair dice. 


A graph of the coefficients in G4(t) is shown in Figure 4.4. 

Finally, Figure 4.5 shows the probabilities for sums on 12 fair dice. 

This is probably enough to convince the reader that normality, once again, is involved. 
The probability distribution for the sums on 12 fair dice is in fact remarkably close to a 
normal curve. We find, for example, that 


P(36 < Sum < 48) = 0.724753, exactly, 


while the normal curve gives 0.728101, a very good approximation. 
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Before the normality can be explained analytically we must consider some more char- 


acteristics of probability generating functions. We will consider these in the next section. 


Probability 


0.1 


4567 8 9 101112131415 1617 1819 20 21 22 23 24 
Sum 


Figure 4.4 Probabilities for sums on 4 fair dice. 


Probability 


0.06 


0.05 


12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 
Sum 


Figure 4.5 Probabilities for sums on 12 fair dice. 


EXERCISES 4.5 


Find all the probabilities when three fair dice are thrown. 


2. (a) Find the generating function when a fair coin is tossed. 


(b) Use part (a) to find the probability of no heads when a fair coin is tossed five times. 


Show that the function (1+ 1)" generates the binomial coefficients, ( L= 


0,.1,.250:35.7: 
af 
What sequence is generated by (1 — 41) 2? 


Consider the set {a,b,c}. What does the function 
(1+ at)(1 + bt)(1+ ct) generate? 


Find a function that generates the sequence 07, 17,27, 37, ... 
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: ; 1 1 1 1 
7. Find a function that generates the sequence To? 3a? 34? 38° 


8. A die is loaded so that the probability a face appears is proportional to the face. If the 
die is thrown five times, find the probability the sum obtained is 17. 

9. Suppose that the probability generating function for the random variable X is Py(f). 
Find an expression for the probability generating function for 
(a) X +k, where k is a constant. 
(b) k-X, where k is a constant. 

10. Verify in Example 4.3.1 that if X is the sum on 12 fair dice then P(36 < X < 48) = 

0.724753. 


11. A fair die and a die loaded are thrown so that the probability a face appears is propor- 
tional to the face are thrown. Find the probability distribution for the sums appearing 
on the uppermost faces. 


4.6 SOME PROPERTIES OF GENERATING FUNCTIONS 


Let us first explain why the products of generating functions generate probabilities associ- 
ated with sums of random variables. 

Suppose A(f) and B(t) are probability generating functions for random variables X and 
Y, respectively, where X and Y are defined on the set of nonnegative integers or some subset 
of them and let 


A(t) =dy + a,t+ayt? +--+ and 
Bt) =by + bytt+ bot? +-°°. 
Then 
A(t) + B(t) = agby + (agb, + a, bo)t + (gb> + a,b, + anbp)t? +++, 


so the coefficient of r* is 


k k 


abi = YPX =i)-P(Y=k-) =P(X+Y=h). 
i=0 i=0 


This explains why we could find powers of G(t) in Example 4.5.1 and generate prob- 
abilities associated with throwing more than one die. 


Since E(t*) = per t'P(X =), it follows that a probability generating function, say 
Py(t), can be regarded as an expectation: 


Py(t) = E(*) = DePKx = i) 


i=0 


for a random variable X. 
For example, G(t) = ye, t-P(X=i)= ye ti. z 
Note that if tf = 1, then 


P,(1) = dy Pe = j=1. 


i=0 
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Also 
Ox = ye P(X=i)= y rx- = i) 


from which it follows that 


Pi) = pS: -t-1. P(X =i). 
i=0 
So co 
PL.) = vi - P(X = i) = E(X). 


i=0 


In addition, 


ive) 


Pl) = yi -(G@—1)-t-?. P(X =D. 


i=0 


So PQ) = E[X - (X — 1)], 


with similar results holding for higher order derivatives. 
Since E(X”) = E[X - (X — 1)] + E(X), it follows that 


Var(X) = E[X - (X — 1)] + E(X) -[E(X)}°__ or 
Var(X) = PY) + PAD) — [PX(DP’. 
As an example, consider throwing a single die and let 

Gp) = a(t +P+P4+44P 415). Then 

G'(t) = atl 421432 +46 +51 +61). So 

G(1) = (1 +24+344+45+46)= = giving E(X) and 

G"(t) = = + 6f + 127 +208 + 3074). So 

G"() = a+ 6 +12 +20 +30) = 2. 


It follows that 
Var(X) = P,(1) + P,(1) — [PDP 


10, Ft EV. BS 
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4.7 PROBABILITY GENERATING FUNCTIONS FOR SOME 
SPECIFIC PROBABILITY DISTRIBUTIONS 


Probability generating functions for the binomial and geometric random variables are par- 
ticularly useful so we derive their probability generating functions in this section. 


Binomial Distribution 


For the binomial distribution, 


Py = BY) = Yr (") Pq. 
x=0 


This sum can be written as 


Py) = (") (pra. 


x=0 


Now the binomial theorem shows that P;(t) = (q¢ + pt)”. It is easy to check that 
PY (t) =np(q+pt)""' so that, since p+q=1, 
P\(1) =np as expected. 
Also, P(t) = n(n— l)p*(q + pt)", 
so E(X*) = Py(1) = n(n— 1p’, 


from which it follows that Var(X) = n(n — 1)p* + np — (np)* = np — np* = npq. 

Now we show, using probability generating functions, that the sum of indepen- 
dent binomial random variables is binomial. Suppose that X and Y are independent 
binomial variables with the probability generating functions Py(‘) = (q+ pt)"* and 
Py(t) = (q+pt)”, respectively. If Z = X + Y, then the probability generating function 
for Z is 


P7(t) = Px(t) - Py) 
P,(t) = (qg + pty" 3 (q + pt)" = (q + pty, 


Assuming that the probability generating functions are unique, that is, assuming that 
a probability generating function can arise from one and only one probability distribution 
function, this shows that Z is binomial with parameters n, +n, and p. 

The derivation above, done in one line, shows the power of the probability generating 
function technique; the reader can compare this with the derivation in Example 4.4.1. 

It should be pointed out, however, as it may have occurred to the reader, that the fact 
that sums of binomials, with the same probabilities of success at any trial, is binomial is 
hardly surprising. If we have a series of n, binomial trials and we record X, the number 
of successes, and follow this by a series of Ny trials recording Y successes, it is obvious, 
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since the trials are independent, that we have X + Y successes inn, + ny trials. The fact that 
we paused somewhere in the experiment to record the number of successes so far and then 
continued has nothing to do with the entire series of trials. 

This raises the question of pausing in the series and changing the probability of success 
at that point. Now the resulting distribution is not at all obvious. Such trials are, confus- 
ingly perhaps, called Poisson’s trials. This problem can be considered using generating 
functions. 


Poisson’s Trials 


As an example, suppose we toss a fair coin 20 times followed by 10 tosses of a coin loaded 
so that the probability of a head is 1/3. What is the probability of exactly 15 heads resulting? 
Using probability generating functions, we see that we need the coefficient 


20 10 
of 5 in (5 + 51) (3 +51) _ This is 


2 3.° 3 

yen(ty(hy 22.) Cue wm 

SE N\2) \2 eS 3 
156031933 

= ———— = 0.12096. 
1289945088 
A complet algebra system will give this result as well as all the other coefficients in 
2 
G + st) . : + zt directly, so it is of immense value in problems of this sort. 


A graph of these coefficients is remarkably normal as shown in Figure 4.6. 

If X is the number of heads in the first series and Y is the number of heads in the 
second series, it is still true that E(X + Y) = E(X)+ E(Y). In this example, E(X + Y) = 
20- : + 10- ; — * and, since the tosses are independent, 

Var(X + Y) = Var(X) + Var(Y) 


=20-=-=+10 


12 _ 65 
2 2 3 3 , 


Probability 
[o) 
ro) 
[o>) 


0 5 10 15 20 25 30 
Heads 
Figure 4.6 Probabilities for the total number of heads when a fair coin is tossed 20 times followed by 10 tosses 
of a loaded coin with p = 1/3. 
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We would expect a normal curve with these parameters to fit the distribution of X + Y 
fairly well. 


Example 4.7.1 


A series of 1 binomial trials with probability p is conducted, and is followed by a series of 
m trials with probability x/n, where x is the number of successes in the first series of trials. 
Let Y denote the number of successes in the second series of trials. Then 


E(X + Y) = E(X) + E(Y) 


=n-p+m-: a =p:(n+m). 
The variance of the sum is another matter, however, since the second series of trials 
is very clearly dependent on the first series and, because of this, Var(X + Y) # Var(X) + 
Var(Y). General calculations of this sort will be considered in Chapter 5 when we discuss 
sample spaces with two or more random variables defined on them. 
For now consider, as an example of this, a first series comprised of five trials with 
probability of success 1/2, followed by a series of three trials. What is the probability of 
exactly four successes in the entire experiment? We find that 


mevrao=¥ (ONY (2) 0-9 


x=1 


B 
Prrerene 
ee = T00 


Again a computer algebra system is of great use in doing the calculations. 
Geometric Distribution 


The waiting time, X, for the first occurrence of a binomial random variable with parameter 
p has the probability distribution function 


P(X =x)= gp, pe ee 


so Py(t) = Yt gp = FY (tg = =. provided that |gt| < 1.Since0<q< 1, 
and we are only interested when t = I, the restriction is not important for us. 

Using Py(f) we find that PC) = E(X) = uy and that Var(X) = ao 

The variable X here denotes the waiting time for the first binomial success. When we 
wait for the rth success, say, the negative binomial distribution arises. Since a negative 
binomial variable is the sum of geometric variables, it follows, if X is now the waiting time 
for the rth binomial success, that 


_ pt r . pt 
es (; - a) ~ (l= qty 


Py(t) can be used to show that the negative binomial distribution has mean r/p and 
variance rq/p*.This is left as an exercise for the reader. 
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Collecting Premiums in Cereal Boxes 


Your favorite breakfast cereal, in an effort to urge you to buy more cereal, encloses a toy 
or a premium in each box. How many boxes must you buy in order to collect all the pre- 
miums? This problem is also often called the coupon collector’s problem in the literature 
on probability theory. Of course we cannot be certain to collect all the premiums, given 
finite resources, but we could think about the average, or expected number, of boxes to be 
purchased. 

To make matters specific, suppose there are 6 premiums. The first box gives us a pre- 
mium we did not have before. The probability the next box will not duplicate the premium 
we already have is 5/6. This waiting time for the next premium not already collected is a 
geometric random variable, with probability 5/6. The expected waiting time for the sec- 
ond premium is then 1/(5/6). Now we have two premiums, so the probability the next 
box contains a new premium is 4/6. This is again a geometric variable and our waiting 
time for collecting the third premium is 1/(4/6). This process continues. Since the expec- 
tation of a sum is the sum of the expectations of the summands and if we let X denote 
the total number of boxes purchased in order to secure all the premiums, we conclude 
that 


1 


EX)=1+r4+ 54 


6, 6,6, 6. 6 
EX)=1+2+-+25+2+- 
Oattor toto, 


E(X)=14+1.24+1.5+2+3+46= 14.7 boxes. 


Clearly the cereal company knows what it is doing! An exercise will ask the reader to 
show that the variance of X is 38.99, so unlucky cereal eaters could be in for buying many 
more boxes than the expectation would indicate. 

This is an example of a series of trials, analogous to Poisson’s trials, in which the 
probabilities vary. Since the total number of trials, X, can be regarded as a sum of geomet- 
ric variables (plus 1 for the first box), and since the probability generating function for a 


geometric variable is a the probability generating function of X is 
24 a 21 21 ty 
Px = — I ° 2 tie ° 5 
—--t 1l-=t 1-=t 1l-<=t 1-35t 
6 
This can be written as 
aie 


PxO = Exp m6 — NO — 46 — 5 


The first few terms in a power series expansion of P,(f) are as follows: 


6 7 8 9 10 11 
Py(t)= + 250° " 175t + 875t + 11585t 875t 616825t 
324 86648 2916 11664 139968 10368 7558272 


5P 
2 


Probabilities can be found from Py(f), but not at all easily without a computer algebra 
system. The series above shows that the probability it takes 9 boxes in total to collect all 
6 premiums is 875/11664 = 0.075. 
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0.06 + 
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6 8 10 12 14 16 18 20 22 24 26 28 30 
Number of boxes 


Figure 4.7 Probabilities for the cereal box problem. 


A graph of the probability distribution function is shown in Figure 4.7. The probabil- 
ities shown there are the probabilities it takes n boxes to collect all 6 premiums. We will 
return to this problem and some of its variants in Chapter 7. 


EXERCISES 4.7 


1. Use the generating function for the binomial random variable with p = 2/3 to verify yu 
and o”. 

2. In the cereal box problem, find o? using a generating function. 

3. (a) Find the probability generating function for a Poisson random variable with param- 
eter A. 
(b) Use the generating function in part (a) to find the mean and variance of a Poisson 
random variable. 


4. Use probability generating functions to show that the sum of independent Poisson vari- 
ables, with parameters A, and A,, respectively, has a Poisson distribution with parameter 
A, + Ay. 

5. A discrete random variable, X, has probability distribution function f(x) = k/2*, x = 
0, 1, 2, 3, 4. 
(a) Find k. 
(b) Find Py(t), the probability generating function. 
(c) Use Py(t) to find the mean and variance of X. 

6. Use the probability generating function to find the mean and variance of a negative 
binomial variable with parameters r and p. 

7. A fair coin is tossed eight times followed by 12 tosses of a coin loaded so as to come 
up heads with probability 3/4. What is the probability that 
(a) exactly 10 heads occur? 
(b) at least 10 heads occur? 

8. Use the probability generating function of a Bernoulli random variable to show that the 
sum of independent Bernoulli variables is a binomial random variable. 
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9. A random variable has probability distribution function 


fa) = =, x= 0,1,2,3,... 
e-x! 
(a) Find the probability generating function for X, Py(t). 
(b) Use Py(t) to find the mean and variance of X. 

10. Suppose a series of 10 binomial trials with probability 1/2 of success is conducted, 
giving x successes. These trials are followed by 8 binomial trials with probability x/10 
of success. Find the probability of exactly 6 successes in the entire series. 

11. Verify that the variance of X is 38.99 in the cereal box problem. 

12. Suppose X and Y are independent geometric variables with parameters p, and p, 
respectively. 

(a) Find the probability generating function for X + Y. 


(b) Use the probability generating function to find P(X + Y = &) and then verify your 
result by calculating the probability directly. 


4.8 MOMENT GENERATING FUNCTIONS 


Another generating function that is commonly used in probability theory is the moment 
generating function. For a random variable X, this function generates the moments, E(X*), 
for the probability distribution of X. If k = 1, the moment becomes E(X) or the mean of the 
distribution. If k = 2, then the moment is E(X?) which we use in calculating the variance. 

The word moment has a physical connotation. If we think of the probability distribution 
as being a very thin piece of material of area 1, then E(X) is the same as the center of gravity 
of the material and E(X”) is used in calculating the moment of inertia. Hence the name 
moment for these quantities which we use to describe probability distributions. 

The extent to which we are successful in using the moments to describe probability 
distributions may be judged from certain considerations. If we were to specify E(X) as a 
value for a probability distribution this would certainly constrain the set of random variables 
X under consideration, but we could still be considering an infinite set of variables. Were we 
to specify E(X7) as well, this would narrow the set of possible random variables. A value 
for E(X>) further narrows the set. For the examples we will consider, we ask the reader to 
accept the fact that, were all the moments specified, X would be determined uniquely. 

Now let us see how this fact can be used. We begin with a definition of the moment 
generating function. 


Definition The moment generating function of a random variable X is 
MIX; t] = Ele], 
providing the expectation exists. 
It follows that 


MIX; t] = Ye -P(X=x), if X is discrete 


x 


foe} 
M[X: 1] -[ e™ . f(x)dx, if X is continuous 


ive) 


provided the sum or integral exists and where f(x) is the probability density function of X. 
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First we show that M[X; 1] does in fact generate moments. Consider the continuous 
case so that 


M[X; t] aa e™ - f(x) dx. 


Expanding e“ in a power series, we have 


> Pe £# 
mixia= [ (141+ ee 31 ba) 09 dx. 


Use the fact that the integral of a sum is the sum of the integrals and factor out all the 
powers of ¢ to find that 


M{X; t] ah f@ acti [x-F09 dx 


2 oo 2 oo 
2 t 3 
+5 fee art ff dx +++: 


2 3 
50 MIXs1] = 1+ 1- EQ) + 5 EOC) + & BOC) ++ 


providing that the series converges. 


M[X;t] generates moments in the sense that the coefficient of — is E(X*). 

We used the derivatives of the probability generating function, Py(t), to calculate 
E(X), E[X(X -—1)], E[X(X — 1)(X — 2)],..., quantities that are often called factorial 
moments. The moments defined above could be calculated from them. We did on several 
occasions to find the variance. 

The derivatives of M[X; t] also have some significance. Since 


2 3 
eee ees 


M'[X;t]= een f = E(X) + t- E(X’) +5 =. E(X3) +++» and 
2 7 
M"[X; t] = EMIX t] = E(X*) + t- E(X°) + : BOO 208 
dt? 2! 


so it is evident that M’LX; 0] = E(X) and M’[X; 0] = E(X’). There are then two methods for 
calculating moments—either a series expansion or by the derivatives of M[X; t]. There are 
in practice very few examples where each method is feasible; generally one method works 
well while the other method presents difficulties. We turn now to some examples. 


Example 4.8.1 


For the uniform random variable, f(x) = 1, for 0 < x < 1. The moment generating function 
is then 


1 
mixia= f l-e® dx= we" Fee “(el — 1). 


In this instance it is easy to express M[X; ft] in a power series. Using the power series 
for e’ we find that ‘ 
t 
MIX; noite tat.. 


so E(X*) = = 
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However this is a fact that is much more easily found directly: 


k i 1 
Bt) = | = Te 
The moment generating function does little here but provide a very difficult way in which to 
find the moments. This is almost always the case; moment generating functions are rarely 
used to generate moments; it is almost always easier to proceed by definition. What then 
is the use of the moment generating function? The answer to this question is that we use 
it almost exclusively to establish the distributions of functions of random variables and the 
distributions of sums of random variables, basing our conclusions on the fact that moment 
generating functions are unique, that is, only one distribution has a given moment generating 
function. We will return to this point later. 
For now, continuing with the example, we found that 


MIX: 1] = “(el =f), 


If we differentiate this, , ; 
M'[X;t] = See 
2 


As t > 0 we use L’ Hospital’s rule to find that 


MIX; A > 3, 
2 
so the process yields the correct result. This is without doubt the most difficult way in which 
to establish the fact that the mean of a uniform random variable on the interval (0, 1) is 1/2! 
Clearly, we have other purposes in mind; the fact is that the moment generating function 
is an extremely powerful tool. Facts can be established easily using it that are very difficult 
to establish in other ways. 
We continue with further examples since the generating functions themselves are of 
importance. 


Example 4.8.2 


Consider the exponential distribution f(x) = e*, x > 0. We calculate the moment gener- 
ating function: 


M[X; ft] -[e - f(x) a= [oem e dx. 
= 0 


foe) 


This can be simplified to 


MIX; 1] -{ eI-Ox dy = ets = ae if t<1. 
0 


Again the power series is easy to find: 


MIX; f]=14+t+P +P 4--: 
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establishing the fact that E[X*] = k!, for k a positive integer. This is a nice way to establish 
the fact that oo 
| we" dea 
0 


which arose earlier when the gamma distribution was considered. 
The reader may also want to show that the moment generating function for f(x) = 
Ae~**, x > 0, is 
A 
M(X; t) = ——. 
ce aay ae 


Example 4.8.3 


The moment generating function for a normal random variable is by far our most important 
result, as will be seen later. We use here a standard normal distribution: 


co 2 
M[X; t] = =| e*.e° 72 dx. 
V 2m J -00 


The simplification of this integral takes some manipulation. Consider the exponent 


x2 


po = pee eo eae 
2 ~~ PD 


2 


by completing the square. This means that the generating function can be written as 


The integral is 1 since it represents the area beneath a normal curve with mean f and 
variance |. 
It follows that 


we] % 


M[X;t] = e2. 


We can also find a power series for this generating function as 


P ife\? 1/2f 
MIX;t]l=14+—+—(= = ty 
7] +S4n(5) +2(5) + 


It follows that 
E(X") = 0 if kis odd and 


2k)! 
E(X**) = Sai for k= 1,2,3, ... 


Moment generating functions for other commonly occurring distributions will be estab- 
lished in the exercises. 
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EXERCISES 4.8 


1. Verify that the moment generating function for f(x) = 3e~*" is = 


7. Arandom variable has moment generating function MX; t] = zen + 


10. 


11. 


12. 
13. 


14. 


-t 
Use the moment generating function in Exercise | to find y aa a 
Show that if Py(t) is the probability generating function for a random variable X, then 
M([X; ft] = Px(e'). 
(a) Find the moment generating function for a binomial random variable with param- 
eters n and p. 


(b) Take the case n = 2 and expand the moment generating function to find E(X”) from 
this expansion. 

(c) Use the moment generating function to find the mean and variance of a binomial 
random variable. 


(a) Find the moment generating function for a Poisson random variable with parameter 
A. 
(b) Use the moment generating function to find the mean and variance of a Poisson 


random variable. 


Find the moment generating function for an exponential random variable with mean A. 

se + ze! . 

(a) Find the mean and variance of X. 

(b) Find the first five terms in the power series expansion of M[X; f]. 

A random variable, X, has probability distribution function f(x) =k - ( i? = 

Led Sy 223 

(a) Find k. 

(b) Find the moment generating function and show the first five terms in its power 
series expansion. 

(c) Find the mean and variance of X from the moment generating function in two ways. 

(a) Find the moment generating function for a gamma distribution. 

(b) Use the moment generating function to find the mean and variance for the gamma 
distribution. 

(a) Find the moment generating function for a v7 random variable. 

(b) Use the moment generating function to find the mean and variance of a y? random 
variable. 

=2<x<-1 

l<x<4. 


FR P 


A random variable X has the probability density functionf(x) = { 


Find the moment generating function for X. 
2 


t 
Find E[X*] for a random variable whose moment generating function is e2, 
Random variable X has moment generating function M[X; f] = e102", 

(a) Find P(7 < X < 12). 


(b) Find the probability density function for Y = 3X. 


A random variable X has the probability density functionf(x) = { 2 
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(a) Show that M[X;¢] = (1 —?#)~!. 
(b) Find the mean and variance of X. 
15. Suppose that X is a uniformly distributed random variable on [2, 3]. 
(a) Find the moment generating function for X. 
(b) Expand M[X; 1] in an infinite series and from this series find y, and 02. 
16. Find the moment generating function for X if f(x) = 2x, 0 <x <1. Then use the 
moment generating function to find y, and 02. 
17. The moment generating function for a random variable X is ( + Ler)’ 
(a) Find the mean and variance of X. 
(b) What is the probability distribution for X? 


18. The moment generating function for a random variable X is e” . Find the mean and 
variance of X. 


4.9 PROPERTIES OF MOMENT GENERATING 
FUNCTIONS 


A primary use of the moment generating function is in determining the distributions of 
functions of random variables. It happens that the moment generating function for linear 
functions of X is easily related to the moment generating function for X. 


Theorem: 


(a) M[cX; t] = M[X; ct]. 
(b) M[X + c; t] = e“M[X; t], where c is a constant. 


Proof 


(a) M[cX; t] = E(e") = E(e*) = M[X; ct]. 
(b) M[X + c; ft] = E[e®t+"] = Efe** - ee] = eE[e*"] = eM [X; ft]. 


So multiplying the variable by a constant simply multiplies ¢ by the constant in the 
generating function; adding a constant multiplies the generating function by e“. 


Example 4.9.1 


We use the earlier theorem to find the moment generating function for a N(y,o) random 
variable from the generating function for (0, 1) random variable. Let Z = ** Since 
oO 


i) 


t 


M[Z;t]=e2, it follows that 


X- at 
u| Ho] =m[x-mt] =e Fu [x4). 
oO oO 
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4.10 


from which it follows that 


M [x: ) =e2+t«c. We conclude from this that 
o 


3 
Therefore if a random variable X has, for example, M[X; t] = e4* 3 then X is normal 
with mean 3/4 and variance 2/3. A remarkable fact, and one we use frequently here is that, 
if X and Y are independent, that 


M[X + Y;t] = M[X; t] - M[Y; 1]. 


To indicate why this is true, we start with M[X + Y; 1] = E[e®+”"] = E[e® - e'”], 
which is the expectation of the product of two functions. If we can show that the expectation 
of the product is the product of the expectations, then 


MIX + Y;t] = Ele - e”] = Efe™] - E[e”] = MIX; t] - MY; 4]. 


As a partial explanation of the fact that the expectation of a product of independent ran- 
dom variables is the product of the expectations, consider X and Y as discrete independent 
random variables. Then 


E(X-¥) =P )ix-y- PX =x and Y=y). 


But if X and Y are independent, then 
P(X =x and Y=y)=P(X =x)-P(Y=y) 
and so 


EX -Y)= )) )ix-y- PX =x and Y=y) 


x 


=) Dixy: PX =x): PY =») 
= )ix- PX =x)- Py PW = y) = EX) - EY). 


So it is plausible that the expectation of the product of independent random variables 
is the product of their expectations and we accept the fact that 


M[X + Y;t] = M[X; t] - M[Y; 1] 


if X and Y are independent. 


We will return to this point in Chapter 5 when we consider bivariate probability distri- 
butions. For now we will make use of the result to establish some surprising results. 


SUMS OF RANDOM VARIABLES — Il 


We have used the facts that E(X + Y) = E(X) + E(Y) and, if X and Y are independent, that 
Var(X + Y) = Var(X) + Var(Y), but these facts do not establish the distribution of X + Y. We 
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now turn to determining the distribution of the sums of two or more independent random 
variables; our solution here will show the power and usefulness of the moment generating 
function. We will return to this subject in Chapter 5 where we can demonstrate another 
procedure for finding the probability distribution of sums of random variables. Here we use 
the fact that the moment generating function for a sum of independently distributed random 
variables is the product of the individual generating functions. 


Example 4.10.1 Sums of Normal Random Variables 


It is probably not surprising to find that sums of independent normal variables are also 
normal. The proof of this is now easy: If X and Y are independent normal variables, 


MIX + Y;f] = MX; 1] - MEY; 1]. 


The exponent in the product on the right above is 


Pe Po, 
Bi+ a + Myt + —, 


This can be rearranged as 
(02 +02) 


(Hy + My)t + 5 


showing that X + Y ~ N[, + 4, 02 + 62]. Note that the mean and variance of the sum can 
be established in other ways. The argument above establishes the normality which otherwise 
would be very difficult to show. 

However the big surprise is that sums of non-normal variables also become normal. 
We will explain this fully in Section 4.11, but the reader may note that this may explain 
the frequency with which we have seen the normal distribution up to this point. For the 
moment, we continue with another example. 


Example 4.10.2 Sums of Exponential Random Variables 


We begin with a decidedly non-normal random variable, namely an exponential variable 
where we take the mean to be 1. So 


fa =e, x>0. 


We know that 
M[X;t] =(-97!. 


It follows that the moment generating function of the sum of two independent expo- 
nential random variables is 
MIX+Y;f]=(1-07. 


This, however, is the moment generating function of f(x) = xe*, x > 0. The graph of 
this distribution is shown in Figure 4.8. 

Now consider the sum of three independent exponential random variables. The 
moment generating function is M[X + Y + Z; t] = (1 — 97>. A computer algebra system, 
or otherwise, shows that this is the moment generating function for f(x) = ae x> 0. 
Figure 4.9 shows a graph of this distribution. 
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Figure 4.8 Sum of two independent exponential random variables. 
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Figure 4.9 Sum of three independent exponential variables. 


We now strongly suspect the occurrence of normality if we were to add more variables. 
We know that 
M((X, + X,+X;,4+---4+X,);=U-9™. 


This is the moment generating function for the gamma distribution, f(x) = 


Te yt oy : : ; : 
Tra” Ie-* y>0. Since the mean and variance of each of the X;’s above is 1, we 
X-n 


can consider X = )y"_, X; and Z = ae Then, 
—n 
MIZ;t] =M Sau a ee en a (ee 


vi vi vi 


The behavior of this is most easily found using a computer algebra system. We expand 
M{[Z; t] and then let n > oo. We find that 
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showing that Z approaches the standard normal distribution. Some details of this calculation 
are given in Appendix A. This establishes the fact that the sums of independent exponential 
variables approach a normal distribution. 

In the beginning of this chapter we indicated that the sums showing on n dice—the 
sums of independent discrete uniform random variables—became normal, although we 
lacked the techniques for proving this at that time. The proof of this will not be shown here, 
but we note that the process followed in Example 4.10.2 will work in this case. Now we 
know that the distribution of sums of independent exponential variables also approaches 
the normal distribution. The fact that the distribution of the sum of widely different sum- 
mands approaching the normal distribution is perhaps one of the most surprising facts in 
mathematics. The fact that normality should occur for a wide range of random variables is 
investigated in the next section. 


EXERCISES 4.10 


1. For the uniform random variable f(x) = 1,0 <x < 1, 
(a) Find the moment generating function. 
(b) Find the mean and variance of the sum of 3 uniformly distributed random variables. 
2. Expand the moment generating function in exercise | and verify the mean and variance. 
3. The moment generating function for a random variable X is M[X; t] = =. 
(a) Find the mean and variance of X. 
(b) Identify the probability distribution for X. 
4. Random variable X has M[X; t] = G + Ley 
(a) Find the mean and variance of X. 
(b) Identify the probability distribution for X. 
5. Find the mean and variance for the random variable whose moment generating function 
is M(Z;t) = (1 —20)>. 
6. Find the moment generating function for the exponential random variable whose prob- 
ability density function is f(x) = 2e-*, x > 0. 
7. Suppose X ~ N(36, 1/10) and Y ~ N(15, 6). If X and Y are independent, find P(X + 
Y > 43). 
8. A random variable X has probability density function f(x) = xe“, x > 0. 
(a) Find the moment generating function, M[X; ¢]. 
(b) Use M[X; f] to find yu, and 02. 
(c) Find a formula for E(X*). 
9. Find the variance of a random variable whose moment generating function is M[X; t] = 
d—-atl. 
10. Explain why the function 2 + = cannot be the moment generating function for any 
random variable. 


11. What is the probability distribution for a random variable whose moment generating 
function is 
Mix 9 
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12. 


13. 


14. 


15. 
16. 


17. 


18. 


19. 


20. 
21. 


22. 


23. 


Identify the random variable whose moment generating function is 
1\ 16 1\ 16 
MIX11 = (5) eo. (1+e3) 


Show that the sum of two independent binomial random variables with parameters n 
and p, and m and p, respectively, is binomial with parameters n + m and p. 


Show that the sum of independent Poisson random variables with parameters 4, and 
A,,, respectively, is Poisson with parameter A, + A,,. 


Show that the sum of independent y? random variables is a y* random variable. 


y? 


Show that a Poisson variable with parameter A becomes normal as A > oo.[Hint: Find 

the limit of M ri 1 ] | 

(a) If X is uniformly distributed on [0, 1] show that M[X; ¢] = 7 — 1). 

(b) Suppose that X, and X, are independent observations from the uniform distribution 
in part (a). Find M[X, + X,;f]. 


(c) Let Y be a random variable with the probability density function 


Gye ys O0<y<l 
ae ae 


Find the moment generating function for Y. 
(d) What can be concluded from parts (b) and (c) above? 


Suppose that X;, i= 1,2,3,...,2 is each exponentially distributed with means 1/4, 
respectively. Let S$ = X; + X, +---+X,. Find the moment generating function for S. 


Suppose that X and Y are independent random variables with probability density func- 
tions 


f(y) =1, O<x<1. and 
ey)=1, -l<y <0. 


Find M[X + Y; 1]. 

If M[X; t] = e740), what is M[15 — 3X; ¢]? 

The price asked for a security is normally distributed with a mean of $50 and standard 
deviation of $5. Buyers are willing to pay an amount that is also normally distributed 
with a mean of $45 and a standard deviation of $2.50. What is the probability a trans- 
action will take place? 


A rod is made up of five sections. A study of the individual sections shows that the end 
sections have mean lengths of 1.001 in. and the three middle sections have mean lengths 
of 1.999 in. each. The standard deviation of the length of each section is 0.004 in. If 
random assembly is employed, 


(a) what will be the average length of the assembled rods? 
(b) what will be the standard deviation of the assembled lengths? 
(c) what is the probability the assembled rod will have length in excess of 8.002 in.? 


Show that the binomial random variable with parameters n and p becomes normal as 
n> oo. 


www.it-ebooks.info 


4.11 


4.11 The Central Limit Theorem 229 


24. Two independent observations, X and Y, are selected from a probability distribution 
with 
f(x) = 2x, O<x< 1. 


(a) Find the moment generating function for the sum, Z = X + Y. 
(b) Find E(Z*). 

25. Let S denote the sum of r independent exponential random variables, each with expec- 
tation 1/a. Show that 2a@S has a Vea distribution. 


26. Show that the moment generating function for the gamma distribution 


= 4. n—1 —x 
fO= Fa e*, x>0 


is C=77. 


THE CENTRAL LIMIT THEOREM 


We have had numerous examples of sums which approach normality as the number of 
summands increases. We now want to consider means, which are multiples of sums, of 
random variables and consider the limiting distribution of such averages. 


Theorem 1: If X denotes the mean of n observations of a random variable X with mean yu 
x * is N(O, 1) provided X has a moment 


“fi 


and variance o7, then the limiting distribution of 
generating function. 


The theorem indicates that the probability distribution of the random variable == 

n 

approaches the N(0, 1) probability distribution. The result is known as the central limit 

theorem. Actually there is a class of theorems known as central limit theorems in probability, 

but since this is the only one we will consider, we will refer to it uniquely and call it the 

central limit theorem. We now indicate a proof. 

Since we presume that X has a moment generating function, let this moment generating 

function be 
i e 
MGS Pia as Pies ht 


where, for convenience, 1, denotes E(X*). Now 
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3 
Using the series expansion log(1 + x) = x — = + = — +++, where |x| < 1, we find 


lo m |x; +] = ut Fy Peso 
e nl AD M2712 S398 


2 
me! woth Eee ca 
DMT ain * 48 31,3 


Bn ae ans 
Son-logM [x: t) = logiMLX; t]] simplifies to 


= 2 
log/M[X; f]] = wyt+ oe plus terms that approach 0 as n— oo. 
n 
ot 


Ht+— — ‘ . 
n 2, the moment generating function for a normal 


This shows that M@ [X; tl-e 
2 


curve with mean yp and variance “? 

This explains many of the normal-like graphs we have encountered previously. If the 
variables are sums, or means, of variables with moment generating functions (as all of ours 
have been), we expect normality as the number of summands increases. This is exactly 
what has happened. This phenomenon was encountered in Section 4.10 where we examined 
sums of independent exponential random variables and found that these approach a normal 
probability distribution. 

This also explains why the normal curve can be regarded as something that is “nor- 
mal” in the sense that it is usual or expected. It can, in fact, be generated from almost any 
probability distribution by taking sums or forming averages. 

We will show some very important statistical applications of this result in the remaining 
sections in this chapter. 


Example 4.11.1 


We noticed in Chapter 2 that the graphs of the binomial distributions we considered became 
normal-like as the number of trials increased; we also promised a full explanation of the 
fact that binomial curves with large values for n can be approximated by normal curves. 
We will do this by using the technique described earlier, namely by finding the limiting 
behavior of the moment generating function. 

Let X be a binomial random variable with parameters 1 and p. Then the moment gen- 
erating function of X is 

MIX; t] = (q + pe')". 


As usual we let Z = ata so that 
oO 


_Ht Lyn 
M[Z;t]l=e « - (q+ per) : 
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So 


eam soe twie yt i ye 
& eg B)arP o 2!o2 3!lo3 7 


SSP ain ltaet eon pe ok 
o e : o - 2!o02 - 3!03 
eee as aoe ae 
o P o P 2!o2 . 3lo3 


Now, using the facts that 4 = n- p and that o* = n- p-q, we find that 


2 
log(M[Z; t]) = 5 + terms that approach 0 as n> o. 


It follows that M[Z; t] approaches the moment generating function of the standard nor- 


mal random variable. 


This justifies our use of the normal distribution in approximating the binomial distribu- 


tion for large values of n, although computer algebra systems allow us to compute binomial 
probabilities exactly for values of n that occur in most practical cases. 


The central limit theorem has wide application to the statistical analysis of data. We 


will show some of the statistical applications of this result in the remaining sections of this 
chapter. 


EXERCISES 4.11 


1. 


Show that, if X is normal with mean y and variance o7, then X is normal with mean ia 


2 a 
a oO . 
and variance —, where X is the mean of n of the X’s. 
n 


. Use the central limit theorem to approximate the probability that the sum on 12 fair 


dice is 38 and then compare the approximation to the exact value. 


. Approximate the probability that the sum of 8 observations taken from an exponential 


distribution with the mean 2 exceeds 5 by using the central limit theorem. 


. Light fixtures in a warehouse contain bulbs whose life lengths are exponential with a 


mean of 720 hours. When a light burns out, it is immediately replaced with a new bulb. 
(a) What is the probability that three bulbs last at least 2000 hours? 


(b) If we want the probability that the bulbs on hand will last at least 3500 hours with 
the probability of 0.95, how many bulbs should be stocked? 


. Suppose that X is a random variable with an unknown mean, y, but its variance is 


known to be 100. How many observations of X must be taken so that the probability X 
is within 2 units of 4 is 0.99? 


. The components in a system are known to have R(1000 hours) = 0.91, where R denotes 


the reliability function. 


(a) Approximate the reliability of a system of 100 such components if at least 70 of 
the components must function at least 1000 hours. 
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10. 


11. 


12. 


13. 


14. 


15. 


16. 


(b) How many components must be installed in the system if at least 90 components 
must last at least 1000 hours with probability 0.98? 


. Articles are shipped in lots of 1000 items. It is known that the probability an item is 


defective is 0.04. Presume that the production process follows the assumptions of the 

binomial distribution. 

(a) Approximate the probability that in 100 lots, the average number of defectives is 
less than 39.5. 

(b) Now suppose we would like the probability that the average number of defectives 
in 100 lots is less than 39.5 to be 0.10. What should the size of each lot be? 


. Traffic accidents at an intersection follow a Poisson distribution with 40 accidents 


expected per year. Approximate the probability of at most 55 accidents in a given year 
at that intersection. 


. An elevator can carry a maximum of 1575 lb. What is the probability that 10 people 


will overload the elevator if their weights are random selections from N(150, 10)? 


A graduating class has 200 graduates. Assume each graduate invites two guests who 
attend, independently, with the probability of 0.8. How many seats for guests should 
be provided at commencement if they desire to be 99% confident of seating everyone? 


A machine turns out precision bolts whose lengths may be regarded as a normal random 

variable with mean 6 and variance 0.0036. To check on whether or not the machine is 

in control, 36 bolts are randomly selected from each day’s production. The machine 

is considered to be under control if the mean of these lengths falls between 5.970 and 

6.015. What is the probability a sample will fail to meet this criterion, even though the 

machine is under control? 

At a local discount store, service times at the checkout counter are observed to be 

normally distributed with mean 3.5 minutes and variance 1.44 min’. 

(a) Find the probability a customer takes more than 5 minutes to check out. 

(b) A customer has been checking out for 3 minutes. What is the probability it will 
take at least 5 minutes for the entire process? 

(c) What is the probability that the next 6 customers check out in a total of 20 minutes 
or less? 

One hundred bolts are packed in a box. The weight of a bolt has mean | oz. and standard 

deviation 0.1 oz. Approximate the probability a box weighs more than 102 oz. 


A candy maker produces mints that have a label weight of 20.4 g, but the actual distri- 
bution of the weights has # = 21.37 g and o” =0.16 g*. Let X be the mean weight of 
a sample of 36 units. Find P(21.21 < X < 21.45). 


Let X be a normal random variable with mean | and variance 16. 
(a) What is the probability an observation is within 2 units of the mean? 


(b) What is the probability that the mean of 4 observations is within 2 units of the 
mean? 


Civil engineers believe that W, the weight (in units of 1000 Ib) the span of a bridge can 
withstand without structural damage resulting is 78.5. Suppose that the weight (again 
in units of 1000 Ib) is a random variable with a mean of 3 and a standard deviation of 
0.3. How many cars can be allowed on the bridge span for the probability that structural 
damage will not occur to be 0.99? 
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17. In Example 2.3.1.2 we remarked that a deviation of more than 0.11 in the average result 
when a fair die is thrown 1000 times is highly unlikely. Show that this is true. 


WEAK LAW OF LARGE NUMBERS 


If we have a production process that is producing a defective item with probability p, we 
have an intuitive notion that we can discover the value of p, at least within a given range, 
if we observe the production process long enough. If we have a distribution with unknown 
mean, “, we have a similar belief, namely that we can determine py, again with a given 
accuracy, if we take a large enough sample and compute the mean of the sample. It is 
reasonable to believe that the mean of this sample ought to be close to yz. These ideas are 
actually correct and we examine mathematical demonstrations here of them. 

We might even refer to these results as a Law of Averages, although the literature of 
probability generally refers to these results as the Weak Law of Large Numbers. 

We consider the second problem first. Suppose that X,, X,,X3,...,X,, is arandom sam- 
ple from some distribution with finite mean and variance, say E(X;) = yw and Var(X;) = o 
fori = 1, 2,...,”. By the central limit theorem E(X) = wand Var(X) =o7/n. 

Now we can apply Tchebycheff’s inequality to find that 


P|ie-nish- 2] 1-5 for some k > 0. 
n 


ay ee i 
s/n o 


Now lete =k- then the inequality above becomes 


o2 


PIIX—p| <e ]>1- 


n-é€2 


Asn => 00, P[|X — | <e ] — leven when e is arbitrarily small. So the probability 
that X and y are arbitrarily close approaches 1. This is a verification of our conjecture that 
a sample mean can be made arbitrarily close to the population mean as the sample size 
increases. 

For the first conjecture, let p, denote the sample proportion of defective items chosen 
from a production process that is producing defective items with probability p. We know that 
E(p,) = p and that Var(p,) = cS where n is the sample size. Again applying Tchebycheff’s 
inequality, we find that 


P \|p,-—p| <k- = See for some k > 0. 
; n ke 


Now ife =k: 4/ , the inequality becomes 


P-4d 


Filz,;=pl se) 21=——.. 
n-eé 


So p, and p can be made arbitrarily close as n becomes large with probability approach- 
ing |. 
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Figure 4.10 Simulation illustrating the weak law of large numbers. 


A computer simulation provides some concrete evidence of the above statement. 
Figure 4.10 shows the result of 100 samples of size 100 each, drawn from a binomial 
population with p = 0.38. 

The horizontal axis shows the number of samples while the vertical axis displays the 
cumulative ratio of successes to the total number of trials. While the initial values exhibit 
fairly large variation, the later ratios are very close to 0.38 as we expect. 

The convergence in statements such as P[|X —yu| <e] —-1, indicating that a 
sequence of means approaches a population value with probability 1, is referred to as 
convergence in probability. It differs from the convergence usually encountered in calculus 
where that convergence is normally pointwise. 


4.13 SAMPLING DISTRIBUTION OF THE SAMPLE 
VARIANCE 


The remainder of this chapter will be devoted to data analysis, so we now turn to some 
Statistical applications of the theory presented to this point. In particular we want to inves- 
tigate hypothesis tests, some confidence intervals and the analysis of data arising in many 
practical situations. We will also examine the theory of least squares as it applies to fitting 
a linear function to data. 

The central limit theorem indicates that the probability distribution of sample means 
drawn from a variety of populations is approximately normal even for samples of mod- 
erate size The probability distributions of other quantities calculated from samples (usu- 
ally referred to as statistics) do not have such simple distributions and, in addition, are 
often seriously affected by the type of probability distribution from which the samples 
come. 

In this section we determine the probability distribution of the sample variance. Other 
statistics will become of importance to us, and we will consider their probability distribu- 
tions when they arise. It is worth considering the sample variance by itself first. 

First we define the sample variance for a sample x1, Xp, ...,X, aS 


= ! Ya; - 2, where x is the mean of the sample. 
= 
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The formula may also be written as 
n n 2 
2 
be-(Bs) 
i= 


i=l 
n(n — 1) 


2 


Clearly there is some relationship between s” and o”. The divisor of n — 1 may be 
puzzling; but, as we will presently show, this is chosen so that E(s*) = 6. Since E(s”) = 


o, s*iscalled an unbiased estimator of o7. If a divisor of n had been chosen, the expected 


value of the sample variances thus calculated would not be the population value o?. 

Now let us consider a specific example. Consider all the possible samples of size 3, 
chosen without replacement, from the discrete uniform distribution on the set of integers 
{1,2,...,20}. We calculate the sample variance for each sample. Each sample variance is 


calculated using 
X, +X +X 


3 
S= st —X)? where x= 3 


i=] 


The probability distribution of s? in part is as follows. Permutations of the samples 
have been ignored. 


2 1 7/3 4 13/3 19/3... 301/3 307/3 313/3 109 343/3 
1140-Prob. 18 34 16 32 30 ... 2 4 v ys 2 


The complete distribution is easy to work out with the aid of a computer algebra system. 
There are 83 possible values for s*. A graph of the distribution of these values is shown in 
Figure 4.11. 

The graph indicates that large values of the variance are unusual. The graph also indi- 
cates that the probability distribution of s* is probably not normal. However the sample size 
is quite small, so we can’t draw any definite conclusions here. 

We do see that the probability distribution shown in Figure 4.12 strongly resembles 
that suggested by Figure 4.11. 


Frequency 


0 20 40 60 80 100 
Variance 


Figure 4.11 Sampling distribution for sample variances. 
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Figure 4.12 A probability distribution suggested by Figure 4.11. 
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Figure 4.13 Distribution of sample variances chosen from a standard normal distribution. 


As another example, 500 samples of size 5 each were selected from a standard normal 
distribution. The sample variance for each of these samples was computed; the results are 
shown in the histogram in Figure 4.13. We now see a distribution with a long tail that 
resembles the probability distribution shown in Figure 4.14. Figure 4.14 is in fact a graph 
of the probability distribution of a chi-squared distribution with 4 degrees of freedom. We 
now show that this is the probability distribution of a function of the sample variance 7. 

The sample variance s* is a very complex random variable, since it involves x, which, 
in addition to the sample values themselves, varies from sample to sample. To narrow our 
focus, suppose now that the sample comes from a normal distribution N(u, 0”). Note that 
no such distributional restriction was necessary in discussing the distribution of the sample 
mean. The restriction to normality is common among functions of the sample values other 
than the sample mean, and, although much is known when this restriction is lifted, we 
cannot discuss this in this book. 

We now present a fairly plausible derivation of the distribution of a function of the 
sample variance provided that the sample is chosen from a N(, 07) distribution. 
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From the definition of the sample variance, we can write 


(n— 1)s* _ y (x; - x)? 
co 


2 
i=1 


Now the sum in the numerator can be written as 


Ma; - = Via; - ) - &- wl. 
i=l i=1 


This in turn simplifies to 


Di - 2 = Vai - w? -n& - wy, 
i=1 i=1 


so 


pyc =< 5 —#Y  G=w 
: = o2 o2/n ° 


o2 


or 
(n= Ds? | @= uP yes 


o2 o2/n- /n o2 


It can be shown, in sampling from a normal population, that X and s* are independent. 
This fact is far from being intuitively obvious; its proof is beyond the scope of this book 
but a proof can be found in Hogg and Craig [18]. Using this fact of independence it follows 


that 
= 2 2 
—1)s? a= pl = (%;— # 
o2 o2/n = 
Bae 
where M[X; t] denotes the moment generating function. Now )_, Se eS is the sum of 


squares of N(0, 1) variables and hence has a chi-squared distribution “with n degrees of 
Gy)" X—u 

o7/n o 
chi-squared distibution with | degree of freedom. Therefore, using the moment generating 
function for the chi-squared random variable, we have 


freedom. Also is the square of a single N(0, 1) variable and so has a 


[oe 1) s° “| -a- 2-2 =(1—-272 or 
o2 


— 1)s2 = 
[> Ms | = =o): 
o2 


ee (n—1)s? Be gba Meet 
indicating that 5— has a y*_, distribution. 
o n-1 
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_1)2 
Since it can be shown that EG) =n-— 1, it follows that E || =n-—1 from 


which it follows that 
E(s*) = 0’, 
showing that s* is an unbiased estimator for o7. 
It is also true that Var(x*_,) = 2(n— 1) so 


Var | = COD Vans?) = 20-1), ot 
Oo oO 
4 
Var(s*) = zi . 
n-1l 


This shows that the sample variance is very variable, a multiple of the fourth power of 
the standard deviation. The variability of the sample variance was noted in the early part of 
this section and this result verifies that observation. 

Also in the early part of this section, we considered the sampling distribution of the 
sample variance when we took samples of size three from the uniform distribution on the 
integers {1,2,3,...,20}. The graph in Figure 4.11 resembles that in Figure 4.12 which in 
reality is a chi-squared distribution with 2 degrees of freedom. Figure 4.11, while at first 
appearing to be somewhat chaotic, is in reality remarkable since the sampling is certainly 
not done from a normal distribution with mean 0 and variance |. This indicates that the 
sampling distribution of the sample variance may be somewhat robust, that is, insensitive 
to deviations from the assumptions used to derive it. 


Example 4.13.1 


Samples of size 5 are drawn from a normal population with mean 20 and variance 300. A 
95% confidence interval for the sample variance, s*, is found by using the a curve whose 
graph is shown in Figure 4.14. A table of values for some chi-squared distributions can be 
found in Appendix B. 

The normal distribution has a point of symmetry and this is often used in calcula- 
tions. The chi-squared distribution, however, has no point of symmetry and so tables 


fo) 


Figure 4.14 XG distribution. 
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must be used to find both upper and lower significance points. We find, for example, 
that 


P(0.2972011 < Ny < 10.0255) = 0.95 so 
452 
P| 0.2972011 < 300 < 10.0255 ) = 0.95 or 


P(22.290 < s? < 751.9125) = 0.95 
a very large range for s7. It is approximately true that 
P(4.721 < s < 27.42) = 0.95 


by taking square roots in the confidence interval for s?. The exact distribution for s could 
be found by finding the probability distribution of the square root of a y? distribution, 
but the above interval is a good approximation. We will consider the exact distribution in 
Section 4.17. 

Other 95% confidence intervals are possible. Another example is 


P(0.48442 < Ma < 11.1433) = 0.95 which leads to the interval 
P(36.3315 < s* < 835.7475) = 0.95. 


There are many other possibilities which can most easily be found with the aid of a 
computer algebra system since tables give very restricted choices for the chi-squared values 
needed. Note that the two 95% confidence intervals above have unequal lengths. This is due 
to the lack of symmetry of the chi-squared distribution. 


EXERCISES 4.13 


1. A sample of five “Six Hour’ VCR tapes had actual lengths (in minutes) of 366, 339, 
364, 356, and 379 minutes. Find a 95% confidence interval for o”, assuming that the 
lengths are N(u1, 67). 

2. Itis crucial that the variance of a measurement of the length of a piston rod be no greater 
than 1 square unit. A sample gave the following lengths (which have been coded for 
convenience): —3, 6, —7, 8, 4, 0, 2, 12, —8. Find a one-sided 99% confidence interval 
for the true variance of the length measurements. 

3. Suppose X ~ N(u, 07) where yw is known. Find a 95% two-sided confidence interval 
for o* based on a random sample of size n. 


4. Suppose that {X,,X, ...,X>,} is a random sample from a distribution with E[X] = 0 
and Var[X] = o?. Find k if 


E[k + {(X, — Xy)" + (X3 — X4)* + (X5 — Xo)? +... + (Xon_1 — Xan) }] = 0. 


5. A random sample of n observations from N(u,o7) has s* = 42 and produced a 
two-sided 95% confidence interval for o” of length 100. Find n. 
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6. Six readings on the amount of calcium in drinking water gave s? = 0.0285. Find a 90% 
confidence interval for o7. 


7. Arandom sample of 12 observations is taken from a normal population with variance 
100. Find the probability that the sample variance is between 50 and 240. 


8. A random sample of 12 shearing pins is taken in a study of the Rockwell hardness of 
the head of a pin. Measurements of the Rockwell hardness were made for each of the 
12 giving a sample average of 50 with a sample standard deviation of 2. Find a 90% 
confidence interval for the true variance of the Rockwell hardness. What assumptions 
must be made for your analysis to be correct? 


9. A study of the fracture toughness of base plate of 18% nickel maraging steel gave 
s* = 5.04 based on a sample of 22 observations. Assuming that the sample comes 
from a normal population, construct a 99% confidence interval for o*, the true 
variance. 


4.14 HYPOTHESIS TESTS AND CONFIDENCE 
INTERVALS FOR A SINGLE MEAN 


We are now prepared to return to the structure of hypothesis testing considered in 
Chapter 2 and to show some applications of the preceding theory to the statistical analysis 
of data. Only the binomial distribution was available to us in Chapter 2. Now we have 
not only continuous distributions but also the central limit theorem, which is the basis for 
much of our analysis. We begin with an example. 


Example 4.14.1 


A manufacturer of steel has measured the hardness of the steel produced and has found 
that the hardness, X, has had in the past a mean value of 2200 Ib. with a known standard 
deviation of 4591.84 Ib. It is desired to detect any significant shift in the mean value, and for 
this purpose samples of 25 pieces of the steel are taken periodically and the mean strength 
of the sample, X, is found. The manufacturer is willing to have the probability of a Type I 
error no greater than 0.05. When should the manufacturer decide that the steel no longer 
has mean hardness 2200 1b? 

In this case, since it is desired to detect deviations either greater than or less than 
2200 lb, we take as null and alternative hypotheses 


H,: w = 2200 
H,: wb # 2200. 
The central limit theorem tells us that 


X-y 


is approximately a NM(0,1) variable. 


va 
Since the alternative hypothesis is two-sided, that is, it comprises the two one-sided 
hypotheses yf > 2200 and y < 2200, we take a two-sided rejection region, {X > k} U{X < 
h}. Since a = 0.05, we find k and h such that 


P[X > k] = P[X < h] = 0.025 so that 
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k= 2200 _ 1.96 and h—=2200 _ ~1.96. 


21085000 21085000 
25: 


25 
These equations give k = 4000 and h = 400 approximately. So H, is accepted if 400 < 


X < 4000. 
The size of the Type II error, f#, is a function of the specific alternative hypothesis. In 


this case if the alternative is H,: 4 = 2600, for example, then 


B = P[400 < X < 4000|u = 2600] 


400 — 2600 2600 <0 — 20) 2600 


/ a / 21085000 
25 


= P[-2.39555 < z < 1.5244] 
= 0.927998, 


so the test is not particularly sensitive to this alternative. 


Confidence Intervals, oc Known 


Suppose that X is the mean of a sample of n observations selected from a population with 
known standard deviation o. By the central limit theorem for a given a we can find z so 


that 
x- 
P\|-z< : <zj=l-a. 
aE 
These inequalities can in turn be solved for producing (1 — a)% confidence intervals 
x ae <us Rage. 
n fn 


Example 4.14.2 
A sample of 10 observations from a normal distribution with o = 6 gave a sample mean 
X = 28.45. A 90% confidence interval for the unknown mean, yp, of the population is 


28.45 — 1.285- = < p < 28.45 + 1.285- 


10 
26.0119 < pw < 30.8881. 


or 


10 


Example 4.14.3 
How large a sample must be selected from a normal distribution with standard deviation 12 


in order to estimate yz to within 2 units with probability 0.95? 
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Here 1/2 the length of a 95% confidence interval is 2. So 


Pes ee so 
Jn n 
(ey 
n= | —— 
2 


Therefore a sample size of n = 139 is sufficient. 


Student’s t Distribution 


In the previous example, it was assumed that o7 was known. What if this parameter is also 
unknown? This of course is the most commonly encountered situation in practice, that is, 
neither nor o is known when the sampling is done. Although we will not prove it here, 
the following theorem is useful if the sampling is done from a normal population. 


Theorem: The ratio of a standard normal random variable and the square root of 
an independent chi-squared random variable divided by n, its degrees of freedom, follows 
a Student’s ¢ distribution with n — | degrees of freedom. Symbolically, 


N(O, 1) = 


n-1|- 
Van/n 


A proof can be found in Hogg and Craig [18]. 


How is this of help here? We know that ioe 
Oo n 


limit theorem and we know from the previous section, if the sampling is done from a normal 
(n— 1)s? 
oO 


is approximately normal by the central 


population, then 
So 


is a chi-squared random variable with n-1 degrees of freedom. 


en _ 
ofyn _X-4H 


———— = thi 
oD? Hn ij s/n 


The sample then provides all the information we need to calculate t. The Student’s 
t distribution (which was discovered by W. G. Gossett who wrote using the pseudonym 
“Student’”) becomes normal-like as the sample size increases but differs significantly 
from the normal distribution for small samples. Several f distributions are shown 
in Figure 4.15. A table of critical values for various ¢ distributions can be found in 
Appendix B. 

Now tests of hypotheses can be carried out and confidence intervals can be calculated if 
the sampling is from a normal distribution with unknown variance as the following example 
indicates. 


Example 4.14.4 


Tests on a ball bearing manufactured in a day’s run in a plant show the following diameters 
(which have been coded for convenience): 8, 7, 3, 5, 9, 4, 10, 2, 6, 7. 
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3 2 1 0) 1 2 3 
t 


Figure 4.15 Student distributions for 3, 8, and 20 degrees of freedom. 


The sample gives x = 6.1 and s? = 203/30. 
If we wished to test H,: = 7 against the alternative H,:  # 7 with a = 0.05,we find 
that 


61-7 
— 
”  \/(203/30)/10 


A table of ¢ values can be found in Appendix B. The critical values for to are +2.26, so 
the hypothesis is accepted. 
Confidence intervals for 4 can also be constructed. Using the sample data, we have 


P 226 gear 2 224 =e << 


s/r/n 


P 226 ae 224 = 0.95 which simplifies as 


4/(203/30) /10 
P[4.5697 < u < 7.63023] = 0.95. 


The 95% confidence interval is also the acceptance region for a hypothesis tested at 
the 5% level. Recall that, when o is known, the confidence intervals arising from separate 
samples all have the same length. This was shown above. If, however, o is unknown, then 
the confidence intervals will have varying widths as well as various central values. Some 
possible 95% confidence intervals are shown in Figure 4.16. 


p Values 


We have always given the a or significance value when constructing a test of a hypothesis. 
These values of a have an arbitrary appearance, to say the least. Who is to say that this 
significance level should be 0.05 or 0.01, or some other value? How does one decide what 
value to choose? 

These are often troublesome questions for an experimenter. The acceptance or rejection 
of a hypothesis is of course completely dependent on the choice of the significance level. 
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Figure 4.16 Some confidence intervals. 


Another way to report the result of a test would be to report the smallest value of a at which 
the test results would be significant. This is called the p value for the test. We give some 
examples of this. 


Example 4.14.5 

In Example 4.14.4, we found that the sample of size 10 gave x = 6.1 and s* = 203/30. 
This in turn produced tg = —1.09 and the hypothesis was accepted since a had been chosen 
as 5%. 


However, we can use tables or a computer algebra system to find that 
P(t) < —1.09) = 0.152018. 


This means that the observed value for ¢ would be in the rejection region if half the a 
value were less than 0.152018. 

Since the test in this case is two sided, we report the p value as twice the above value, 
or 0.304036. 

Now the person interpreting the test results can decide if this value suggests that the 
results are significant or not. Undoubtedly the decision here would be that the result is not 
significant although this p value would be of value and interest in many studies. 


Example 4.14.6 

Suppose we revise Example 4.14.1 as follows. Suppose the hypotheses are as follows: 
H,: # = 2200 
H,: w > 2200 


and that a sample of size 25 gave a sample mean of x = 3945. 
Since we know in this case that o2 = 21,085, 000 we find that 


z= S288 = 2200 — 4 500i and that 


21085000 
25 
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P(Z > 1.90011) = 0.0287094, 


so this is the p value for this test. If the significance level is greater than 0.0287094 then the 
result is significant; otherwise, it is not. Many computer statistical packages now report p 
values together with other test results. 


EXERCISES 4.14 


1. Test runs with an experimental engine showed they operated, respectively, for 24, 28, 
21, 23, 32, and 22 minutes with | gallon of fuel. 


(a) Is this evidence at the 1% significance level that 


H,: 4 = 29 should be accepted against 
H,: w< 29? 


(b) Find the p value for the test. 


2. Machines used in producing a particular brand of yarn are given periodic checks to 
help insure stable quality. A machine has been set so that it is expected that strands of 
yarn it produces will have breaking strength = 19.50 oz, with a standard deviation 
of 1.80 oz. A random sample of 12 pieces of yarn has a mean of 18.46 oz. Assuming 
that the standard deviation remains constant over a fairly wide range of values for p, 
(a) Test H,: uw = 19.50 against H,: uw # 19.50 at the 5% significance level. Find the p 

value for the test. 
(b) Now suppose that o is also unknown and that the sample standard deviation is 
1.80. Test the hypothesis in part a] again. Are any additional assumptions needed? 
(c) Under the conditions in part a], find f for the alternative H,: 4 = 19.70. 

3. “One quarter” inch rivets are produced by a machine which is checked periodically by 
taking a random sample of 10 rivets and measuring their diameters. It is feared that 
the wear-off factor in the machine will eventually cause the machine to produce rivets 
with diameters that are less than 1/4 inch. Assume that the variance of the diameters is 
known to be (0.0015). 


(a) Describe the critical region, in terms of xX , for a test at the 1% level of significance 
for 


H,: 4 =0.25 against the alternative 
H,: w < 0.25. 


(b) What is the power of the test at = 0.2490? 

(c) Now suppose we wish to test H,: w = 0.25 against H,: uw = 0.2490 with a = 1% 
and so that the power of the test is 0.99. What sample size is necessary to achieve 
this? 

4. A manufacturer of light bulbs claims that the life of the bulbs is normally distributed 

with mean 800 hours and standard deviation 40 hours. Before buying a large lot, a 

buyer tests 30 of the bulbs and finds an average life of 789 hours. 
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10. 


11. 


(a) Test the hypothesis H,: « = 800 against the alternative H,: « < 800 using a test 
of size 5%. 


(b) Find the probability of a Type I error for the alternative H,: = 790. 
(c) Find the p value for the test. 


. Asample of size 16 from a distribution whose variance is known to be 900 is used to test 


H,: “ = 350 against the alternative H,: > 350, using the critical region X > 365. 
(a) What is a for this test? 
(b) Find f for the alternative H,: = 372.50. 


. A manufacturer of sports equipment has developed a new synthetic fishing line that he 


claims has a mean breaking strength of 8 kg. To test H,: = 8 against the alternative 

H,: « # 8, a sample of 50 lines is tested; the sample has a mean breaking strength of 

7.8 kg. 

(a) If o is assumed to be 0.5 kg and a = 5%, is the manufacturer’s claim supported by 
the sample? 

(b) Find f for the above test for the alternative H,: = 7.7 

(c) Find the p value for the test. 


. For acertain species of fish, a sample of measurements for DDT is 5, 10, 8, 7, 4, 9, and 


13 parts per million. 


(a) Find a range of values of yw, for which the hypothesis H,: “=, would be 
accepted at the 5% level. 


(b) Find a 95% confidence interval for o?, the true variance of the measurements. 


. The time to repair breakdowns for an office copying machine is claimed by the man- 


ufacturer to have a mean of 93 minutes. To test this claim, 23 breakdowns of a model 

were observed, resulting in a mean repair time of 98.8 minutes and a standard deviation 

of 26.6 minutes. 

(a) Test H,: » = 93 against the alternative H,: uw > 93 with a = 5% and state your 
conclusions. 

(b) Supposing that o* = 625, find f for the alternative H,: = 95. 

(c) Find the p value for the test. 


. A firm produces metal wheels. The mean diameter of these wheels should be 4 in. 


Because of other factors as well as chance variation, the diameters of the wheels vary 
with standard deviation 0.05 in. A test is conducted on 50 randomly selected wheels. 
(a) Find a test with a = 0.01 for testing H,: w = 4 against the alternative H,: yu # 4. 
(b) If the sample average is 3.97, what decision is made? 

(c) Calculate # for the alternative H,: uw = 3.99. 

A tensile test was performed to determine the strength of a particular adhesive for 
a glass-to-glass assembly. The data are: 16, 14, 19, 18, 19, 20, 15, 18, 17, 18. Test 
H,: “ = 19 against the alternative H,: yu < 19, 

(a) if o? is known to be 2. 

(b) if o? is unknown. 

The activation times for an automatic sprinkler system are a subject of study by the 
system’s manufacturer. A sample of activation times is 27, 41, 22, 27, 23, 35, 30, 33, 
24, 27, 28, 22, and 24 seconds. The design of the system calls for its activation in at 
most 25 seconds. Does the data contradict the validity of this design specification? 
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The breaking strengths of cables produced by a manufacturer have mean 1800 Ib. It is 

claimed that a new manufacturing process will increase the mean breaking strength of 

the cables. To test this hypothesis, a sample of 30 cables, manufactured using the new 

process, is tested giving X = 1850 and s = 100. 

(a) If a = 0.05, what conclusion can be drawn regarding the new process? 

(b) Find the p value for the test. 

A sample of 80 observations is taken from a population with known standard deviation 

56 to test H,: « < 300 against the alternative H,: > 300 giving X = 310. 

(a) Find the minimum value of @ so that H, would be rejected by the sample. 

(b) Assuming that the critical region is X > 310, find f for the alternative = 315. 

A contractor must have cement with a compressive strength of at least 5000 kg/cm?. He 

knows that the standard deviation of the compressive strengths is 120. In order to test 

H,: # = 5000 against the alternative H,:  < 5000, a random sample of four pieces of 

cement is tested. 

(a) If the average compressive strength of the sample is 4870 ksc., is the concrete 
acceptable? Use a = 0.01. 

(b) The contractor must be 95% certain that the compressive strength is not less than 
4800 ksc. How large a sample should be taken to insure this? 

The assembly time in a plant is a normal random variable with mean 18.5 seconds and 

standard deviation 2.4 seconds. 

(a) A random sample of 10 assembly times gave X = 19.6. Is this evidence that H,; 
H = 18.5 should be rejected in favor of the alternative H,: > 18.5 if a = 5%? 

(b) Find the probability that H, is accepted if w = 19. 

(c) It is very important that the assembly time not exceed 20 seconds. How large a 
sample is necessary to reject H,: = 18.5 with probability 0.95 if ~ = 20? 

A lot of rolls of paper is acceptable for making bags for grocery stores if its true mean 

breaking strength is not less than 40 Ib. It is known from past experience that o = 2.5 

Ib. A sample of 20 is chosen. 

(a) Find the critical region for testing the hypothesis H,: « = 40 against the alternative 
H,: # < 40 at the 5% level of significance. 

(b) Find the probability of accepting H/, if in fact » = 40.5 Ib. 

(c) If o were unknown and a sample of 20 gave X = 39 lb and s = 2.4 lb, would H, 
be accepted with a = 5%? 

The drying time of a particular brand and type of paint is known to be normally dis- 

tributed with y = 75 minutes and o = 9.4 minutes. In an attempt to improve the drying 

time, a new additive has been developed. Use of the additive in 100 test samples of the 

paint gave an average drying time of 68.5 minutes. We wish to test H,: = 75 against 

the alternative H,: yw < 75. 

(a) Find the critical region if a = 5%. 

(b) Does the experimental evidence indicate that the additive improves drying time? 

(c) What is the probability that H,, will be rejected if in fact y = 72 minutes? 

(d) Find the p value for the test. 
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4.15 


18. The breaking strength of a fiber used in manufacturing cloth is required to be not less 
than 160 Ib/in.? inch. Past evidence indicates that o = 3 psi. A random sample of four 
specimens is tested and the average breaking strength is found to be 158 psi. 


(a) Test H,: « = 160 against a suitable alternative using a = 5%. 
(b) Find # for the alternative ~ = 157. 

19. An engineer is investigating the wear characteristics of a particular type of radial auto- 
mobile tire used by the company fleet of cars. A random sample of 16 tires is selected 
and each tire used until the wear bars appear. The sample gave x = 41, 116 and s? = 
1,814,786. 

(a) Finda so that P(u > a) = 0.95. 
(b) Find a 90% confidence interval for o. 
(c) Answer part (a) assuming that the sample size is 43 with x and s? as before. 

20. The diameter of steel rods produced by a sub-contractor is known to have standard 
deviation 2 cm., and, in order to meet specifications, must have w = 12. 

(a) If the mean of a sample of size 5 is 13.3, is this sufficient to reject H,: w = 12 in 
favor of the alternative H,: > 12? Use a = 0.05. 


(b) The manufacturer wants to be fairly certain that H, is rejected if y = 13. How large 
a sample should be taken to make this probability 0.92? 


HYPOTHESIS TESTS ON TWO SAMPLES 


A basic scientific problem is that of comparing two samples, possibly one from a con- 
trol group and the other from an experimental group. The investigator may want to decide 
whether or not the two populations from which the samples are drawn have the same mean 
value, or interest may center on the equality of the true variances of the populations. We 
begin with a comparison of population means. 


Tests on Two Means 


Example 4.15.1 


Suppose an investigator is comparing two methods of teaching students to use a 
popular computer algebra program. One group (X) is taught by the conventional 
lecture-demonstration method while the second group (Y) is divided into small groups and 
uses cooperative learning. After some time of instruction, the groups are given the same 
examination with the following results: 


We wish to test the hypothesis 


Ao: Wy = My against 


HH, Wy. < Hy. 
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Assume that the sampling is from normal distributions. We know that 


E(X-Y)= H,— Hy and that 


so, from the central limit theorem, 


(X-Y)-Cy- Hy), 
- is a N(O,1) variable. 


_ 
Kj 


Now zcan be used to test hypotheses or to construct confidence intervals if the variances are 
known. Consider for the moment that we know that the populations have equal variances, 
say o* = 289. Then 
(77 — 84) —0 
i 
289 | 289 
— ) ® 


= —0.9496. 


If the test had been at the 5% level then the null hypothesis would be accepted since 


z>—1.645. 
We could also use z to construct a confidence interval. Here a one-sided interval is 
appropriate because of H,,. We have 


P (x-Y) = 1645 


which becomes in this case the interval greater than —19.126. Since 0 is in this interval, the 
hypothesis of equal means is then accepted. 


Example 4.15.2 


A situation more common than that in the previous example occurs when the population 
variances are unknown. There are then two possibilities: they are equal or they are not. 
We consider first the case where the variances are unknown, but they are known to be equal. 
Denote the common value for the variances by o”. The variable 


(X — Y) — (uy — Hy) 


is a N(O,1) variable. 


x 


(n,—1)s2 (ny- sp Se 
—— BD variable with (n, — 1)+(m,—- 1) =n, +n, —-2 
o o d ; 
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degrees of freedom since each of the summands is a chi-squared variable. Since a t variable 


is the ratio of N(O, 1) variable to the square root of a chi-squared variable divided by its 
number of degrees of freedom, it follows that 


tay-tny—2 = 


This can be simplified to 


(X — Y) — (uy — Hy) 
Tntny—2 EZ —————— where 


ge (n, — 1)s2 + (ny — Is; 


Pp 
Nt, 2 


2 


s* is called the pooled variance. 


Using the data in Example 4.15.1, we find that 


2 12(193.7) + 8(309.4) 


3 = 239.98 and 
77 — 84) — 
po OE ge 
1 1 
239.98 (4 + i) 


Since the one-sided test rejects Ho if tag < —1.725, the hypothesis is accepted if a = 0.05. 


Example 4.15.3 


Finally, we consider the case where the population variances are unknown and cannot be 
assumed to be equal. (Later in this chapter, we will show how that hypothesis may be tested 
also.) Unfortunately, there is no exact solution to this problem, known in the statistical 
literature as the Behrens—Fisher problem. Several approximate solutions are known; we 
give one here due to Welch [36]. 

Welch’s approximation is as follows: 

The variable 


(X — Y) — (uy — Hy) 
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is approximately a t variable with v degrees of freedom where 


251 


Using the data in the previous examples, we find that v = 14.6081 so we must use a ¢ 


variable with 14 degrees of freedom. This gives 


T14 = —0.997178, 


a result quite comparable to previous results. The critical t value is —1.761. The Welch 
approximation will make a very significant difference if the population variances are quite 


disparate. 


Tests on Two Variances 


It is essential to determine whether or not the population variances are equal before testing 
the equality of population means. It is possible to test this using two samples from the 


populations. 
If v2 and ras are independent chi-squared variables then the random variable 


Xala 
x, /b 


= F(a,b), 


where F(a, b) denotes the F random variable with a and b degrees of freedom respectively. 
A proof of this fact will not be given here. The reader is referred to Hogg and Craig [18] for 
a proof. A table of some critical values of the F distribution can be found in Appendix B. 


The probability density function for F(a, b) is 


if b ath peal 
f@ = — ———-a@2-b?2 -(axt+b) 2 -x2 7, x>0. 


@) 


The F variable has two numbers of degrees of freedom; one is associated with the 
numerator and the other with the denominator. Due to the definition of F, it is clear that 


1 


F(a,b) = F(b, a). 


So the reciprocal of an F variable is an F variable with the numbers of degrees of 


freedom interchanged. 
Several F curves are shown in Figure 4.17. 


The F distribution can be used in testing the equality of variances in the following way. 


If the sampling is from normal populations, then 


(n, — se (n, — Iss 
- — and — : 


2 2 
ot ore 
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Figure 4.17 Some F distributions. 


are independent chi-squared variables. It follows then that 


(n,-1 )s2 


o (n,-1) 


(ny Ds 
e301) 


is the ratio of two independent chi-squared random variables each divided by its number of 
degrees of freedom; it follows that this variable, simplified as 


= F(n, — I,n,— 1). 


oy 


il i 
Sufsie| Sb. 


Now consider the hypotheses 


ae are 
Ho: 0, = 05 
ee 2 
H,: 0% # Oy. 


If the null hypothesis is true, then the F variable becomes 


F(n, — 1,n, - 


This is used as the test statistic with a two-tailed critical region. 


Example 4.15.4 


As an example, consider the data used in the previous section where 


n, = 13, sz = 193.7 and 
Hj =9,. ¥ =3004. 
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Here F(12, 8) = — = 0.626. The critical values, choosing a = 0.05, are 4.1995 and 


0.2002 so the null hypothesis is accepted. 

One-sided tests can also be performed with the F statistic; it is common not to worry if 
a variance is too small, but in many instances, care must be taken that the variance has not 
become too large. In this case, large variation may result in a production process producing 
too great a percentage of product that does not meet specifications. We will discuss this 
further in Section 4.17. 


EXERCISES 4.15 


1. To test the hypothesis that the resistance of wire can be reduced by at least 
0.050 ohms by alloying, samples of 12 for each type of wire gave the following 
results: 


Mean Standard Deviation 
Alloyed wire 0.083 0.003 
Standard wire 0.136 0.002 


(a) Test H,: o; = o; using a = 0.05. 
(b) Does the data substantiate the claim? 


2. Two analysts took repeated readings on the hardness of city water with the following 
results: 


Analyst A Analyst B 
x y 
0.46 0.82 
0.62 0.61 
0.37 0.89 
0.40 0.51 
0.44 0.33 
0.58 0.48 
0.48 0.23 
0.53 0.25 
0.67 
0.88 


(a) Test H,: 4, = 0.55 against the alternative H,: yw, # 0.55 using a = 0.05. 
(b) Test the hypothesis in part a] again assuming now that o2 = 0.0081. 
(c) Test H,: u, = uy against the alternative H,: u, <u, with a = 5%. 
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3. Over a long period of time, ten patients selected at random are each given two different 
treatments for arthritis. The results of standard tests are as follows: 


Patient Treatment | Treatment 2 
1 47 52 
2 38 35 
3 50 52 
4 33 35 
5 47 46 
6 23 2H, 
7 40 45 
8 42 41 
9 15 17 
10 36 41 


Test H,: @, = Ho against H,: a, < p> at the 1% level of significance, assuming 
that the population variances are equal. 


4. An experiment compares two different processes for producing steel plate. The mea- 
surements represent the thickness of the plate. The samples gave 


n, =8 x= 6.701 s, = 0.108 


x 


n,=6 y= 6.841 Sy = 0.155 


y 


(a) Using a = 0.02, test H,: 02 = o, against the alternative H,: o2 # o,. 
(b) Now test H,: 4, = Hy against the alternative H,: , # M, using a 5% test. 


5. Samples from normal populations gave 
n,=6 %=22.6 = 102A 
ny =8 y= 31.9 s = 89.6 


Find a 98% confidence interval for o2/ Ce 


6. In comparing times to failure (in hours) of two different types of light bulbs, two sam- 
ples gave 


n, =13 X= 984 s? = 8742 


n, = 15 ¥=1121 sf = 9411 


Find a 95% confidence interval for the difference of the true population means, ,. — Hy 
(a) assuming 0? = o,. 
(b) assuming o2 = 9000 and Oo; = 9500. 
7. In a batch chemical process, two catalysts are being compared for their effect on the 
output of the process reaction. A sample of 11 batches was prepared using catalyst 1 
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and a sample of 9 batches was prepared using catalyst 2. The sample results are as 
follows: 


m= 11 3 =85 | = 16 
m= 9 mH 81 3) = 25. 


Assuming that the true variances are equal, find a 95% confidence interval for the dif- 
ferences between the means, “4, — [. 


. To determine yield strengths, a study of 10 pieces of cold-rolled steel (X) gave a sample 


mean of 29.8 kilograms per square inch (ksi.) and a sample variance of 4.2 ksi*. A 

second sample of 13 pieces of galvanized steel (Y) gave a sample mean of 34.7 ksi and 

a sample variance of 4.9 ksi’. 

(a) Assuming the true variances are equal, find a 95% confidence interval for the dif- 
ference between the true strengths, 4, — My. 


(b) Repeat part (a) assuming the true variances are o2 = 4 and oe =. 


. An experiment is conducted to compare the crash resistance of two different types of 


automobile bumpers. Type A bumpers were mounted on 12 cars and type B bumpers 
were mounted on 9 cars. The cars were driven into a concrete wall at 10 mph.and the 
resulting damage (in $ to repair) was assessed. The results were as follows: 


A= 235 5, = 421 
A= 286 55 =5il, 


2 


. 2 — 
(a) Would the hypothesis o 4 = op 


accepted or rejected? 
(b) Find a 98% confidence interval for Oo; / a. 
(c) Test H,: “4 = Mg against the alternative H,: 4 < Mg ina test of size 1%. 


when tested against Oo; x on with a = 2%, be 


The vending machines in the student lounge and in the cafeteria should dispense the 
same amount of coffee. However, some students believe that the mean amount of coffee 
dispensed in the lounge (ZL) is less than that dispensed in the cafeteria (C). The following 
summary statistics were obtained from samples from each machine. 


mh = 12%, = 101 = 08 
Nc =10 XC =9.8 sc= 14. 


Is there statistical evidence to support the student’s claim? Assume that the amounts 
dispensed are approximately normal and use a test of size 5%. 


A recent study of accident victims in a Boston hospital gave the following results: 


n Mean Standard Deviation 


Seat Belts 1S 565 220 
No Seat Belts 12 1200 540 


the data indicating the cost of hospitalization. 


www.it-ebooks.info 


256 Chapter4 Functions of Random Variables; Generating Functions; Statistical Applications 


(a) Assuming that ogg (the true standard deviation for seat belt wearers) is 220 and 
Onsp (the true standard deviation for nonseat belt wearers) is 540, find a 95% con- 
fidence interval for sp — Lysp- 

(b) Answer part (a) assuming now that the true standard deviations are unknown, but 
equal. 

(c) Is it tenable to believe that the population variances are equal? State the smallest 
p Value at which the data would reject the hypothesis of equal variances (against 
the alternative of unequal variances). 


12. The following data represent the running times of films produced by two motion picture 


companies: 
Company Time (minutes) 
A 102 86 98 109 92 
B 81 165 97 134 92 87 114 


(a) Test H,: Oo; = oF against the alternative H,,: o; # oF with a = 2%. 
(b) Test H,: 4 = Mp — 10 against the alternative uw, < “pz — 10 witha = 1%. 

13. Wire cable is manufactured by two processes. It is desired to determine if the pro- 
cess affects the mean breaking strength of the cable. Laboratory tests are performed 
by putting samples under tension and recording the load required to break the cable. 
Following is the sample data: 


Sample Size Mean Variance 
x 6 8.2 2.0 
Y 7 11.2 4.0 


(a) Test H,: o2 =o? against the alternative H,: 02 # 02 with a = 2%. 
(b) Now test H,: yw, = M, against the alternative H,: , # MW, using a 5% test. 

14. Five samples of a ferrous-type substance are to be used to determine if there is a dif- 
ference between a laboratory chemical analysis and an X-ray fluorescence analysis of 
iron content. Each sample was split into two sub-samples and the two types of analysis 
were applied with the following data, which represents per cent yield: 


Sample 
Analysis 1 2 3 + 5 
X-Ray 11.0 2.0 8.3 3.1 2.4 
Chemical 11.2 1.9 8.5 3:3 2.4. 


Assuming the population of measurements to be normal, test whether or not the 
two methods of analysis give on average, the same result. Use a = 5%. 


15. An automobile designer suggests that painting a racing car reduces its top speed. He 
selects 6 cars and tests them with and without paint. The results are as follows: 
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Top Unpainted Speed (mph) Painted 
1 189 186 
2 186 185, 
é) 183 179 
4 188 184 
5 185 183 
6 188 186 


257 


Use the data to decide whether or not that painting the cars reduces the top speed. 


Use 5%. 


16. The golf scores of two competitors, A and B, are recorded over a period of 10 days. 
Scores for the two golfers are recorded over a period of 10 different days on which 
weather conditions varied widely. Golfer A claims that his game is better than golfer 
B. Does the data support this claim? 


Day 


FP OANADMNFWNK 


A 


87 
86 
79 
82 
78 
87 
84 
81 
83 
81 


17. A wire manufacturer alters the production process hoping to increase the resistance 
of the wire. Below are the results of samples taken from the old process and the new 
process. Has the resistance of the wire increased? 


New Process Old Process 
0.140 0.135 
0.138 0.140 
0.143 0.136 
0.142 0.142 
0.144 0.138 
0.137 0.140 


18. Two randomly selected groups of industrial trainees are taught a new assembly line 
operation by two different methods. Measurements were made on the time to complete 
the operation with the following results: 


Group Size Mean Standard Deviation 
I 10 60.43 20.2 
I 10 31.23 26.8 
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(a) Assuming normality, test H,: o; = or against the alternative H,: or # or, at the 
5% level of significance. 

(b) Test H,: uw, = my against the alternative H,: vw, # fy, at the 5% level of signifi- 
cance. 

(c) Determine the values of k for which H,: a, = ,, + k would be accepted at the 5% 
level of significance when tested against the alternative H,: a, # My, +k. 


19. Two samples are drawn from normal populations, each with variance 100. Due to 
sampling costs, it is possible to select 2n items from population A, but only n items 
from population B. In testing H,: uw, = Mg against the alternative H,: uw, = Ug + 3 it 
is desired to have a = f = 0.10. Find n so that this is approximately so and then discuss 
the implications of rounding 7 to an integer. 


20. Suppose samples of sizes ny and ny are available from normal populations whose vari- 
ances and means are unknown. It is suspected that oy — 202. 


s2 s2 

(a) Explain, by establishing the distribution of +, how —> can be used to test the 
p y g 52 52 
Sy Sy 


hypothesis that oy = 20, : 
(b) Assuming that the hypothesis in part a] is accepted, show that atest of H,: wy = Uy 
can be based on 


pe AA Y= x = HY) 
1 1 


ny ny 


, Where 


Sy 


(ny — 1)s% + (ny — 1s} 
W 2(ny + ny — 2) 


by establishing the distributions of r and s2.. 


4.16 LEAST SQUARES LINEAR REGRESSION 


The estimation of unknown parameters was considered in the beginning of this chapter; 
we return to that problem here and introduce a new principle of estimation, that of least 
squares. This is a commonly used principle when a straight line or other curve must be 
fitted to a set of data and when one wants the “best” fitting straight line or curve to the 
experimental data. 


Example 4.16.1 
Suppose we are given the data set {x,,X>, ...,x, }. We want to find a number, a, such that 


S= xe; —a)’ is as small as possible. 
i=1 


That is, we want to minimize the sum of the squared deviations from a. Such estimates, 
when they exist, are known as least squares estimates. In this case letting the derivative of 
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S with respect to a equal 0 gives 


n 
3 =—2) (x;-—a)=0_ with the solution 
da ho 
i=] 
n 
x; 
a i=l = 
q=- =x, the mean of the data set. 
n 


So the sum of the squared deviations, )7_, (x; — a)”, is minimized when a = Xx. 


Example 4.16.2 


A researcher suspects that the achievement score on a standard mathematics examination, Y, 
for a group of students is a linear function of the student’s IQ score, X. In order to investigate 
this hypothesis, data are collected and a model of the situation is presumed. In this case the 
model is composed of a linear part, a + bx,, reflecting the researcher’s hypothesis, and a 
random patt, e;, reflecting the fact that the relationship between Y and X may be subject to 
other factors that are not accounted for in the experiment. The random part e; is in fact an 
observation of a random variable. Often this is a normal random variable as we will see. 
The model chosen is 
yj =atbx;+;,i = 1,2,...,n. 

Here y; is the ith observation of Y, and x; is the ith observation of X. 

There are two fundamental problems here: one is the estimation, from the data, of the 
unknowns a and 5; the other is, given a and b, does the line fit the data well or not? 


To answer the first question, we will use the principle of least squares to estimate the 
parameters a and b. This principle chooses values of a and b that minimize a sum of squares, 


namely, 
S= ve? = xe —a—bx,. 
i=1 i=l 
We take the partial derivatives of S with respect to a and b and equate each to 0: 


as n 
a 22.0% — a—bx,)(-1) =0 


as n 
5p = 220% — a bx;)(-x)) = 0. 
Summing and simplifying gives 
yi =na +b>\x, and 
i=l i=1 


n n 


Dix = Gx; + 5yi2. (4.2) 
i=] 


i=1 i=1 
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Equations ((4.2)) are called least squares equations. Their simultaneous solution is 
y=at 


and 


n n n n 
nyvxyi- 4 LY LO-DO;,-Y) 
i=l i=l i=l i=l 


b= = 


n n 2 ud 
nx - (Ex) Le 
i=1 = 


i=1 


Usually b is found and then @ is found from the equation y = G+ bx. 
Suppose now that the data collected is as follows: 


Math. score 92 86 104 109 re) 100 91 110 128 
1Q 104 91 123 102 86 99 92 114 99 


A scatter plot of the data points is shown in Figure 4.18. The data appears to be some- 
what linear, with considerable variation. 
Substituting in ((4.2)) we find that the least squares estimates for a and b are as follows: 


@ = 63.0792 and b = 0.38244 


so that the least squares line, which is called the regression of Y on X, is 
J; = 63.0792 + 0.38244x,;. 


Figure 4.19 shows this line plotted with the data points. 

The line does not appear to predict the Y values very well, so we consider whether or 
not the line fits the data satisfactorily. 

We begin with the values the line predicts for the X values in the data set. We show a 
table below of the data points, the predicted values, and the residuals (the observed Y values 
minus the predicted Y values). 


120 
115 
110 

o@ 105 
100 

95 


80 90 100 110 120 
Math score 
Figure 4.18 Scatter plot of data. 
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© 105 } 
100 
95 


90 


70 80 90 100 110 120 130 
Math score 


Figure 4.19 Regression line and data points. 
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MathScore (X) 92 86 104 109 75 100 91 110 


IQ(Y) 104 91 123 102 86 99 92 114 
Prediction 98.26 95.97 102.85 104.77 91.77 101.32 97.88 105.15 
Residual 5.74 -4.97 20.15  —-2.77  -5.76 -2.32 -5.88 8.85 


128 
99 
112.03 
—13.03 


The absolute size of these residuals does not tell much except, as we previously noted, 
some of the residuals are quite large. To show a specific test of the hypothesis that the 


straight line fits the data well, consider the total sum of squares of the residuals: 


n 
y6; ~§)? where ~), = @ + bx,. 


i=l 
Now we show a remarkable identity by adding and subtracting y: 


n 


YO; - HY = HlOi-Y -G- HP. 
i=l i=l 
YoO;-FY = YO;-H? -2%0,-DG-HD + YG -3”. 
i=l i=l i=l i=l 
Using ((4.2)) the last two terms can be combined and we have 


YOi-IP = YOi-wW - YG-yW or 
i=] i=1 i=] 


DYOi-w = VG-W + YO - HN. 
i=! i=1 i=] 
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This identity is an example of an analysis of variance identity. Such identities arise 
frequently in the analysis of experimental data. The terms have individual interpretations 
with respect to the regression problem: 


n 
XC —y) is the total sum of squares, 


i=l 


n 
Ye —y)* is the sum of squares due to regression, and 
i=l 
n 
Y6% —¥,)? is the residual or error sum of squares. 
i=1 


The identity ((4.3)) then partitions the total sum of squares into two parts: the sum of 
squares due to regression and the residual sum of squares. It is beyond the scope of this 
book, but, presuming the error term, e;, in the original model to be normally distributed 
with mean 0 and variance o7, the sum of squares due to regression can be shown to have 
a chi-squared distribution; it can be shown that the error sum of squares, divided by n — 2, 
is also a chi-squared variable; moreover the two chi-squared variables are independent. It 
follows that the ratio of the chi-squared variables is a F variable. This information is usually 
exhibited in an analysis of variance table: 


Analysis of variance table 
Source Sum of Squares df Mean Square F(i,n— 2) 


n 
Yow 
i=1 


Regression YG; -y) 1 XG; -y/1 mm 
i=l i=l 
» (FP /(n-2) 
i=1 
Error Y0;,- IY n-2 Y0;- FP /(n- 2) 
i=1 i=l 
Total X60; -y)? n-1 
i=l 


If the data points are linearly related, then we expect some of the predicted values to 
differ significantly from the average of the y values, so we expect )""_, (5; — y)* to be large 
and we would expect that the error sum of squares, ))"_, (y; — ¥;) to be small since the 
predicted values and the observed values should be close together. So, if the regression is 
truly linear, we expect the F ratio to be large. This leads to a one-tailed test of the hypothesis 
that the data follow a linear relationship. 

The analysis of variance table for the data in this example is as follows: 


Analysis of variance table 


Source Sum of Squares Degrees of Freedom Mean Square F(,7) 
Regression 284.368 1 284.368 2.5117 
Error 792.521 7 113.217 

Total 1076.889 8 
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The test is assessed by calculating P[F(, 7) > 2.5117] = 0.157022. The test in this 
case can be shown to be a one-tailed test with the rejection region in the right-hand tail. We 
conclude that the regression is not significant in this case. The analysis shows that there is 
more random scatter in the data than there is a linear relationship; this might be suspected 
from the sums of squares above since the sum of squares for regression is 284.368 while 
the error sum of squares, 792.521, is much larger. 

The ratio of the sum of squares due to regression to the total sum of squares is called 
the coefficient of determination or the square of the correlation coefficient, r: 


n 
LG -9P 
2 _ i=1 
I 
p2 
i=1 


= y)* 


Since the numerator in r? is at most equal to the denominator, it follows that 


In this example, r?7 = 0.264065 or r = 0.5139. So only about 26% of the total variation 
in the y values is due to a linear relationship; 74% is due to randomness. 

Computer algebra systems and statistical analysis packages make the calculation of the 
analysis of variance table easy; this should always be done since a least squares fit without 
a test for linearity is quite meaningless. The interpretation of a possible linear relationship 
based on the correlation coefficient alone is not advised. We will return to a study of least 
squares linear regression in Chapter 7. 


EXERCISES 4.16 


1. The following data represent the weight, x, in units of 1000 lb, and y, the fuel consump- 
tion in gallons per 100 miles, for six different brands of automobiles: 


x 3.4 4.1 2.6 2.0 1.9 3.4 
y Se) 6.5 3.6 2.9 3.1 4.9 


(a) Make a scatter plot of the data. 

(b) Fit a least squares regression line to the data. 

(c) Show the analysis of variance table and state the conclusions that can be drawn 
from it. 

2. Ohm’s law can be written in the form of a regression as / = B, + 6, V where £, = 0 and 
B, = 1/R. Since V is set by the experimenter, this can be thought of as the independent 
variable while J is viewed as the dependent variable. Data from an experiment on one 
wire are as follows: 


Vv 0.5 1.0 1.5 1.8 2.0 
I 0.52 1.19 1.62 2.00 2.40. 
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(a) Plot the data. Do there appear to be any unusual points? 

(b) Find the least squares fit for the data. 

(c) Interpret the analysis of variance table and state conclusions this has for the exper- 
iment. 

3. A recent paper reported a study on the relationship between applied stress (the inde- 
pendent variable, X, in kg/mm) and the time to fracture (the dependent variable, Y, 
in hours) for a type of stainless steel under uniaxial stress in a solution at a constant 
temperature. Ten different settings of applied stress were used and the following data 


resulted: 

X Y 
1 2 63 
2 5.0 58 
3 10.0 55 
4 15.0 61 
5 17.5 62 
6 20.0 37 
7 25.0 38 
8 30.0 45 
9 35.0 46 
10 45.0 19 


(a) Plot the data. Do there appear to be any unusual points? 

(b) Find the least squares fit for the data. 

(c) Interpret the analysis of variance table and state any conclusions that can be drawn 
from it. 


4. A small study on productivity in a factory compared hours worked (x) with parts assem- 
bled (y). The data are as follows: 


x y 


AB wWN eR 
BDUNWNWN 


(a) Find the equation of the least squares regression line. 


(b) Show the analysis of variance table and state any conclusions that can be drawn 
from it. 


(c) Find the correlation coefficient. 


5. The following data represent the total number of items produced by a manufacturing 
process (Y) and the total cost involved in production (X). 
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(a) Show a scatter plot of the data. 

(b) Find the equation of the least squares regression line predicting Y from X. 

(c) Find the equation of the least squares regression line predicting X from Y. Explain 
why the answer here is not equivalent to the answer in part (b). 


A study was done to compare engine size (measured by cubic inches of displacement) 
and miles per gallon estimates for eight compact automobiles. The data are as follows: 


CID(X) 121 120 97 98 122 97 85 122 
MPG(Y) 30 31 34-27 29 34 38 32 


(a) Find the equation of the least squares regression line predicting Y from X. 


(b) Show the analysis of variance table and discuss any conclusions that can be drawn 
from it. 


(c) Find the correlation coefficient. 


. Raw material used in the production of a synthetic fiber is stored in a place that has no 


humidity control. Measurements of the relative humidity in the storage place and the 
moisture content of a sample of the raw material (both in percentages) on 12 days were 
as follows: 


Humidity(X) 46 53 37 42 34 29 60 44 41 48 33 = 40 
Moisture(Y) 12 14 11 13 10 8 17 12 ~6©:100615069 (13 


(a) Find the equation of the least squares regression line predicting Y from X. 
(b) Find the correlation coefficient. What is the interpretation of this number? 


. The yield of a chemical process is thought to be a function of the amount of catalyst 


added to the reaction. An experiment gave the following data: 


Yield (Y) 60.54 63.86 63.76 60.15 66.66 71.66 70.81 65.72 
Catalyst (X) 0.9 1.4 1.6 1.7 1.8 2.0 Xl 23 


(a) Find the least squares regression line predicting Y from X. 
(b) Find the correlation coefficient. 


. An experimenter has a data set {(x;,y1),(%,Y2),.--,(%,,.,)} and wishes to fit an 


equation of the form y; = px: to the data. 


(a) Use the principle of least squares to find a formula for B., the least squares estimator 
for p. 
(b) Use the result in part (a) to calculate B for the data: 


An experimenter wishes to fit an equation of the form y, = a+ F to the data set 
x 


{(X1,91), (X%, Yo), -- (X,, Y,) }. Show the least squares equations and show how to use 
these to estimate a and f. 
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11. Ina regression situation it is known that the regression line passes through the origin. 
The model is as follows: 


y,; = px; +e, i= 1,2,..,n, 


where the es are distributed independently and normally with mean 0 and variance o”. 


(a) Find the least squares estimator of , B. 
(b) Let a; = oF a constant for each value of i. Find Elp] and Var{l. 
i=1%j 
(c) Inthe usual regression situation }""_,(y; — ¥;) = 0. Show that this is not necessarily 
true in this case. 


(d) Show that: 


n 


1] Lor-Hy = Doi -P Ye and 
i=l i=l 


i=1 
2e[0-97]-0-o 


12. Suppose that an experimenter wishes to fit a straight line of the form y; = a + mx; to 
the data set { (x1, V1), (%2, yz), --., (X.Y) }, Where m is a known constant. 


(a) Find the least squares estimate of a, @. 
(b) Show that E(@) = a. 


QUALITY CONTROL CHART FOR X 


Manufacturers often monitor product quality through periodic sampling during a produc- 
tion process. The samples taken are usually small in size and are frequently reduced to 
simple statistics, such as the sample mean or range, for each sample. These statistics are 
then plotted in a graph indicating the time series of the measurements so that monitoring of 
the process can be done as time progresses. Such charts are called quality control charts. We 
will consider one type of quality control chart in this section and some of the mathematics 
behind the analysis of the data collected. We consider a specific example. 


Example 4.17.1 


A manufacturer of ball bearings takes periodic random samples of size 4 from the produc- 
tion line and measures the mean diameter of the ball bearings, X, for each sample. The 
sample data (which has been coded for convenience) is shown below together with X and 
the sample standard deviation, s for each sample. 

Now the sample means are plotted in time order sequence in Figure 4.20. 

Now what does the chart tell us about the process? First we seek a central value for 
the means. The sample size is small and generally nothing is known about the mean of 
the population from which the samples were drawn (in fact that population may be chang- 
ing resulting in a change in the population mean which is one reason the control chart is 
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Data X Ss 
3, -1, -6, 4 0 4.54606 
9, 0, 3, —2 2.5 4.79583 
—12, 4, -9, -6 —5.75 6.94622 
11,9, 4, 1 6:25 4.57347 
-1,-2,4,-1 0 2.70801 
8, 1, -2,3 2 4.20317 
-1, -4, -9, -3 —4.25 3.40343 
—8, —3, —6, —4 —5.25 2.21736 
1, -2, -2, 1 —0.5 1.73205 
—2, -—2, -—3, -2 —2.25 0.50000 
0, 4, 1, —2 0.75 2.50000 
Mean 
6 L 
UCL 
4 L 
2 L 
2 4 6 8 10 12 
a2 
41 
-6 pe 


Figure 4.20 A control chart for sample means. 


being used). Since the central limit theorem indicates that the means are approximately nor- 


mal, this central value is taken as the mean of the sample means, X. Our second problem 
concerns the variation in the measurements. If 2 were known, then we could find a confi- 
dence interval for the true mean, y, but we don’t know this variance. Usually, an estimate 
of the confidence interval is found as 


where G is an estimate of the unknown standard deviation. The multiplier 3 is commonly 
used in industry and produces about 0.0027 of the data outside the limits. It is then fairly 
safe to assume that an observation outside the limits did not arise by chance, but is due to 
some alteration in the production process. The limits are called upper and lower control 
limits. 

Now, how is o estimated? There are many ways to do this; we show here a method 
based on the sample standard deviations. 

We know that 

(n — 1)s? ea 


7 %2_, and that E(s*) =o”. 
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and, with large samples, E(s)  o, but this, unfortunately, is not true with small samples, 
so we will find an expression for E(s) that will hold for any sample size. We find first the 
distribution of y,_ 1. 

Suppose then that X is a v7 random variable so that 


f(x) = ———-x7! -e77, x>0 and let Y= VX. 


Then G(y) = PY < y) = P(X < y) = P(X < y?) = FG”) so 


gy) =2-y- fy") or 


gy) = —.— e 2, y>O and 
r[z] 23 
p) 
2 oe pee 
E[Y] = - | ye 2 dy 
r[é]23 0 


Letting z = y’/2 in the integral, 


V2 Dred, | 2a v2 r+1 
E(Y) = —— z2e* dz= —I ; 
“it ye 


Now letting r=n-— 1, 


n 
el va 
ee] ve 
values of cy is shown in Table 1. While practical interest centers on small values for n, note 
that c, approaches | quite rapidly. 
A graph of these values is shown in Figure 4.21. 


The factor is denoted by cy, in quality control literature. A table of 
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Table 4.1 
n c4 
2 0.797885 
3 0.886227 
4 0.921318 
5 0.939986 
6 0.951533 
7 0.959369 
8 0.965030 
9 0.969311 
10 0.972659 
11 0.975350 
12 0.977559 
13 0.979406 
14 0.980971 
15 0.982316 
16 0.983484 
17 0.984506 
0.975 
0.95 
0.925 
~ 09 
oO 
0.875 
0.85 4 
0.825 
0.8 - 4 
2 4 8 12 15 


n 


Figure 4.21 Factors c, for a quality control chart. 


For the data in this example, X = —0.54545 and the average of the sample standard 
deviations is s = 3.46596, so 


é¢= — = 3.76196 giving control limits 
UCL = —.5455 +3 - 256 = 5.0974 and 
4 
Pe a= 55942 6488. 
4 


These are the limits shown on the control chart in Figure 4.20. We see that the fourth 
sample has mean 6.25 which exceeds the upper control limit. Many manufacturers would 
investigate the production process at that point. 
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Another common method for estimating the process standard deviation is based on the 
mean range of the samples. One reason for using the range is that it is easily calculated 
on the production floor. Since the range ignores all of the sample except for two values, 
it is not surprising to find that it is not as efficient as the sample standard deviations for 
estimating o. 

We see that one value of the control chart is the glimpse it gives of the production 
process through the use of a sample, which usually is small. Many other types of control 
charts are used in industry. Interested readers are referred to Duncan [9] and Grant and 
Leavenworth [14] for more information. 


EXERCISES 4.17 


1. Ten samples, shown below, were taken in order to establish control limits in an indus- 
trial process. 


Sample Values 

1 10.6 10.1 11.3 9.1 
2 10.2 11.6 10.5 10.5 
3 10.1 9.8 8.8 03 
4 10.1 9:5 10.3 10.6 
5 8.7 11.6 9.7 93 
6 10.1 9.8 10.8 8.9 
7 11.2 11:5 10.9 11.6 
8 10.6 9.6 10.3 9.9 
9 9.8 77 9.4 9.9 
10 10.0 8.4 10.6 8.8 


(a) Calculate the sample mean and the sample standard deviation for each sample. 
(b) Calculate upper and lower control limits using the results in part (a). 
(c) Plot the control chart for the sample means. Are any of the data points unusual? 


2. A new machine fills cereal boxes by weight. It is desired to start a control chart on the 
average weight of the boxes. Ten samples, each of size 5, are taken with the following 


results: 
Sample 
1 16.1 16.2 15.9 16.0 16.1 
2 16.2 16.4 15.8 16.1 16.2 
3 16.0 16.1 15.7 16.3 16.1 
4 16.1 16.2 15.9 16.4 16.6 
5 16.5 16.1 16.4 16.4 16.2 
6 16.8 15.9 16.1 16.3 16.4 
7 16.1 16.9 16.2 16.5 16.5 
8 15.9 16.2 16.8 16.1 16.4 
9 15.7 16.7 16.1 16.4 16.8 
10 16.2 16.9 16.1 17.0 16.4 
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(a) Calculate the sample mean and the sample standard deviation for each sample. 
(b) Calculate upper and lower control limits using the results in part (a). 


(c) Plot the control chart for the sample means. Are any of the data points 
unusual? 


CHAPTER REVIEW 


Drawing conclusions from samples—statistical inference—has been the subject of this 
chapter. Statistical inference is a central part of the scientific method since it involves the 
analysis of sample data gathered in the course of a scientific investigation. 

Any quantity calculated from a sample is called a statistic. Statistics are thus random 
variables with probability distributions, means, and variances of their own. From the central 
limit theorem, we know that the mean of a sample, X, has, approximately, a normal distri- 
bution with mean y and variance o7/n, where n is the sample size and where p and o7 are 
the true population values; this theorem is a primary tool in establishing tests of hypotheses 
on single means and on the difference between means from two populations. 

The purpose of this chapter is to establish tests of hypotheses and confidence intervals 
for some common parameters of statistical distributions as well as to introduce the principle 
of least squares in fitting a linear function to a data set. In order to establish the probabil- 
ity distributions of some common statistics determined from samples, we first establish 
the probability distributions of some common functions of random variables. We begin 
with a procedure for finding the probability distribution function for a function of a ran- 
dom variable, say Y = H(X). We used the fact that g(y), the probability density function 
for Y, is 


a(y) = a) 
: dy 

where G(y) = P(Y < y) = P[H(X) < yl. 
We then solve for X and express G(y) in terms of F(x), the distribution function for X. 
We established the fact that if X ~ N(0, 1) then X* has a cA distribution. We also estab- 

lished the distribution of the maximum of a set of uniformly distributed random variables. 
The most important function of a random variable involves sums of independent ran- 

dom variables. In Section 4.4 we used the fact that 


PUX+¥=2= PPX =x)- PY =2-h. 
k 


This sum can be evaluated for a variety of probability distributions. Important facts 
here are that the sum of independent binomials with common value for p is binomial, 
and that the sum of independent Poissons is Poisson, where the parameter for the sum 
is the sum of the parameters of the individual Poissons. Binomial and Poisson variables 
are called reproductive since their sum has the same kind of distribution as the individual 
summands. 

Variables are not commonly reproductive, however, and we calculated the probability 
distribution for the sum of independent uniform variables from some special cases. The 
sums appear to become normal; that is indeed the case, a fact that is established in subse- 
quent parts of the chapter. 


www.it-ebooks.info 


272 Chapter4 Functions of Random Variables; Generating Functions; Statistical Applications 


The probability distribution of a random variable can be summarized by the probability 
generating function for the random variable. If it exists, this function (which we denoted 
by P,(f)) is the expected value of a special function of X, t*, so 


Py(t) = E(t*). It follows that 
Py = Yi - PX =z) 


since we used this function only for discrete random variables. 

Sections 4.5 and 4.6 establish some properties of probability generating functions. 
These were supposing A(f) and B(f) to be probability generating functions for variables 
Xand Y, respectively, thatthe coefficient of r* in a power series expansion of Py(t) is P(X = 
k). It is in this sense that P,(t) summarizes or characterizes the random variable since it 
is possible to find al/ the probabilities from Py(t).E(X) = P\ (1). Var(X) a PA) + PY) — 
[P{, (1). The coefficient of in A(t) - B(t) is P(X + Y =k). 

This fact is of great importance in establishing probability distributions for sums. 

Probability generating functions for some specific random variables are derived in 
Section 4.5. There it is found that 


Py(t)=(q+ pt)" for a binomial variable with parameters n and p, and 


t 
PyxH= arr for a geometric random variable with parameter p. 


We then discussed a series of dependent binomial trials called Poisson’s trials in which 
the probabilities in binomial trials vary. 

The probability generating function is not often used for continuous random variables. 
Instead, a function we call the moment generating function is used. It too characterizes 
probability distributions. Its definition is 


MIX; #] = E(e*%) = i met - f(x) dx 


provided of course that the integral exists. The moment generating function generates 
moments (although it is uncommon to use it for this purpose) in either of two ways:The 
coefficient of t* /k! in the power series expansion of M[X; tf] is E(X*).or 


d‘M[X; t] 


ak 20 = E(x"). 


We calculated some specific moment generating functions: 
If fay=1, O<x<1, then M[X;7] = “(el — 1). 
Ax A 
If fx) =Aae“, x >0, then M[X;t] = i 
2 
If X~ N(O,1), then M[X;t] = e2. 


This last fact takes on enormous importance since, if we can show that a moment gen- 
erating function approaches that of the standard normal, we can conclude that the variable 
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approaches the standard normal distribution. In this sense the moment generating function 
is of great importance since it can be used to establish the distributions of functions of 
random variables, and, in particular, their sums. 

These properties of moment generating functions are of importance: 


1] M[c + X31] = e+ M[X;1t] and 
2] M[cX: t] = MIX: ct]. 


2 


2 

These facts help establish the fact that if X ~ N(y,o) then M[X;4] =e" 2” 
Sums of random variables are considered in Section 4.10; it is found that under 
quite general conditions these approach normality as the number of summands increases. 


Examples were shown that demonstrate that 


Sums of normals are normal. 
Sums of exponentials are normal. 


Sums of binomials are normal. 


These facts help explain the persistence of normality in many examples earlier in 
the book. 

In Section 4.11 we state and demonstrate the central limit theorem: If X has mean yu 
and variance o, and if X has a moment generating function, then X > N(uo / Vn). 

Before examining confidence intervals and tests of hypotheses, we showed that the 
probability distribution of the sample variance can be determined from the fact that 


(n— 1)s? 2 
o2 Anat 


when the sample of size n is chosen from a normal distribution with known variance o. 


The central limit theorem allows us to test hypotheses on single means. We also con- 
sidered the distribution of the sample mean when the population variance is unknown 
and finally the distribution of the difference between sample means. The tests of vari- 
ous hypotheses considered are summarized here. Critical regions are presumed to be con- 
structed so that the size of the test is a. 

To test H,: uw =m, against H,: u # u,, the best critical region is X>aorX<b. 
One-sided tests are used in testing one-sided alternatives. If the population standard devi- 


ation, o, is known, then a and b can be determined using the fact that sad is a normal 
Jn 
variable. 
To test H,: w=, against H,: uw # uw, and the population standard deviation is 
unknown, then the best critical region is X > aor X < b, where values for a and b can be 


: ‘ X- Pe 
determined using the fact that =“ follows a t,,_; distribution. 


vn 
If two samples are drawn and both population variances are both known, and the 
hypothesis H,: 1, = H, is to be tested against H,: w, # , then the test statistic is 
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(X — Y) — (uy — Hy) 
CS —  ——— 


where z is a normal random variable. 
If the population variances are unknown but can be presumed to be equal and if the 
samples are chosen from normal populations then 


— XY) = (Hy = by) 


)=—\———_. where 
1 1 
S — — 
P ny + ny 

(n, — 1)s2 + (n, — 1)s2 
2_ : y y d 
5 = an 
ny, +n, - 2 


ven, +n, — 2. 


If the population variances are unknown and known to be unequal then no exact test is 
known for the hypothesis that the population means are equal. An approximate test, due to 
Welch, is 


(X- Y)- (uy, - Wy) 
T, = ——————._ where 


wv 
S |e 
a 

wv 


(i) 
ny + 


ny-1 ny-1 


A test of H,: o2 = 0? and the alternative H,: 02 4 o is based on the fact that 


= F(n, — 1,n,— 1). 


a fhy 


il an 
Sips]. Sb 


We then considered simple linear regression, or the fitting of data to a straight line of 
the form y; = a+ bx,,i = 1,2,...,n. The principle of least squares chooses those estimates 
that minimize 


S= }'0;-a- bx)’. 
i=l 


www.it-ebooks.info 


4.17 Quality Control Chart for xX 


The result is a set of least squares equations: 


n n 
yy: = nat by\x; and 
i=l i=l 
n n n 
= b 2 
yay - ay'x; + by x. 
i=l 


i=1 i=1 


Their simultaneous solution is: 


niet 41 — Doe % Det Yi Doe 8%} — NO; - Y) 


b SS 
ne — (Dix) Lint Oi — 9 
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Finally in this chapter we considered quality control charts for sample means. This 
chart plots means calculated from periodic samples and establishes upper and lower control 


limits, indicating that the process may be out of control. These limits are X+3— where 
n 


G is an estimate of the unknown population standard deviation, o. Table 1 gives divisors of 


the average sample standard deviations which are used to find G. 


PROBLEMS FOR REVIEW 


Exercises 4.3 # 1, 3, 4, 8. 
Exercises 4.4 # 1, 2, 4. 

Exercises 4.5 # 1, 4. 

Exercises 4.7 # 1, 2, 4, 6, 7, 10. 
Exercises 4.8 # 2, 3, 4, 5, 6, 9. 
Exercises 4.10 # 1, 2, 5, 9. 
Exercises 4.11 #1, 2, 4,5 
Exercises 4.13 # 1, 2, 5, 7 
Exercises 4.14 # 1, 2, 5, 6, 8, 10, 17 
Exercises 4.15 # 1, 2, 3, 6, 8, 9, 11, 14, 15, 19 
Exercises 4.16 # 1, 3, 5, 9. 
Exercises 4.17 #1 


SUPPLEMENTARY EXERCISES FOR CHAPTER 4 


1. For the triangular distribution f(x) = ; (1 = 2) , O<x<a, 


(a) Find the moment generating function. 


(b) Use the moment generating function to find the mean and variance of X and 


check these results by direct calculation. 
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2. 


Find the mean and variance of X where X has the Pareto distribution, f(x) = a- b* - 
xD a>0, b>0, x>b. 


. Consider the truncated exponential distribution, f(x) = e*, 0< x <In2. 


(a) Find the moment generating function for X and expand it in a power series. 
(b) From the series in part a], find the mean and variance of X. 


. Random variable X denotes the number of green marbles drawn when a sample of two 


is selected without replacement from a box containing 3 green and 7 yellow marbles. 


(a) Find the moment generating function for X. 
(b) Verify that E(X+) = 1. 


5. Find E(X*) if X is a Weibull random variable with parameters a and f. 


6. Let S = )_, X;, where X; is a uniform random variable on the interval (0,1). Find 


10. 


11. 


12. 


13. 


the moment generating function for Z = SH where and o are the mean and stan- 

a : o ; ; : 
dard deviation, respectively, of $. Then show that this moment generating function 
approaches the moment generating function for a standard normal random variable 
as N > oo. 


. A fair quarter is tossed until it comes up heads; suppose X is the number of tosses 


necessary. If X = x, then x fair pennies are tossed; let Y denote the number of heads 
on the pennies. Find P(Y = 3), simplifying the result as much as you can. 


. The coin loaded so as to come up heads 1/3 of the time is tossed until a head appears. 


This is followed by the toss of a coin loaded so as to come up heads with a probability 
of 1/4 until that coin comes up heads. 


(a) Find the probability distribution of Z, the total number of tosses necessary. 
(b) Find the mean and variance of Z. 


. Customers at a gasoline station buy regular or premium unleaded gasoline with prob- 


abilities p and g = | — p, respectively. The number of customers in a daily period is 
Poisson with mean y. Find the probability distribution for the number of customers 
buying regular unleaded gasoline. 
A company claims that the actual resistance of resistors are normally distributed with 
mean 200 ohms and variance 4 - 10~* ohms?. 
(a) What is the probability that a resistor drawn at random from this set of resistors 
will have resistance greater than 200.025 ohms? 
(b) A sample of 25 resistors drawn at random from this set has an average resis- 
tance of 200.01 ohms. Would you conclude that the true population mean is still 
200 ohms? 
A sample of size n is drawn from a population about which nothing is known except 
that the variance is 4. How large a sample must be drawn so that the probability is at 
least 0.95 that the sample average, X, is within | unit of the true population mean, p? 
Suppose 12 fair dice are thrown. Let X denote the total number of spots showing on 
the 12 uppermost faces. Use the central limit theorem to estimate P(25 < X < 40). 
Mathematical and verbal SAT scores are, individually, N(500, 100). 
(a) Find the probability that the total mathematical plus verbal SAT score for an 
individual is at least 1100, assuming that the scores are independent. 


(b) What is the probability that the average of five individual total scores is at least 
1100? 
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14. A student makes 100 check transactions in a period covering his bank statement. 


15. 


16. 


17. 


18. 
19. 


20. 


Rather than subtract the amount he spends exactly, he rounds each checkbook entry 
off to the nearest dollar. Assume that the errors are uniformly distributed on l-5. : : 
What is the probability the total error is more than $5? 
The time a construction crew takes to construct a building is normally distributed with 
mean 90 days and standard deviation 10 days. After construction, it takes additional 
time to install utilities and finish the interior. Assume the additional time is inde- 
pendent of the construction time, and is normally distributed with mean 30 days and 
standard deviation 5 days. 
(a) Find the probability it takes at least 101 days for the construction of only a 
building. 
(b) Find the probability it takes an average of 101 days for the construction of only 
four buildings. 


(c) What is the probability that the total completion time for one building is at most 
130 days? 


(d) What is the probability that the average additional completion time for five 
buildings is at least 35 days? 


A random variable X has the probability distribution function 


fe = 5 for x =-1,0, or 1. 


(a) Find M[X; t], the moment generating function for X. 
(b) If X, and X, are independent observations of X, find M[X, + X,; 1] without first 
finding the probability distribution of X, + X5. 

(c) Verify the result in part (b) by finding the probability distribution of X, + X). 
A random variable X has the probability density function f(x) = 2(1 —x), O0O<x< 1. 

(a) Find the moment generating function for X. 

(b) Use the moment generating function to find a formula for E(X*). 

(c) Let Y = 3(x + 1). Find M[Y; 1]. 
A random variable X has M[X; ft] = e76'+32" Bind P(-4 < X < 16). 


A discrete random variable X has the probability distribution function 


u Loa 
2 
foy=d} x22 
3 
“ x=3 


(a) Find yp, and 02. 
(b) Find M[X; t]. 
(c) Verify the results in part (a) using the moment generating function. 


Find the moment generating function for a random variable X with probability density 
function f(x) =e-eif0O<x< 1. 
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21. 


22. 


23. 


24. 


25. 


26. 


27. 


28. 


29. 


A random variable has M[X; t] = ze! + ze + a . 
(a) What is the probability distribution function for X? 
(b) Expand M[X; 1] in a power series and find , and o?. 


Suppose that X, is the number of 6’s in 1, tosses of a fair die and that Y is the number 
of 3’s in n, tosses of another fair die. Use moment generating functions to show that 
S =X +Y has a binomial distribution with parameters n = n,; +n and p = 1/6. 


A square law rectifier has the characteristic Y = kX”, x > 0, where X and Y are the 
input and output voltages, respectively. If the input to the rectifier is noise with the 
probability density function 


fa) = ao x20, f>0, 


find the probability density function of the output. 
If X ~ N(O, 1), 
(a) find P(X? > 5). 
(b) Suppose X,,X>,...,Xg are independent observations of X. Find P(X? +X} + 
aE: x > 10. 


Suppose X has the probability density function 


joy {it 21<%<0 


l-x, O<x<l. 


(a) Find the probability density function of Y = X?. 
(b) Show that the result in part (a) is a probability density function. 
The resistance, R, of a resistor has the probability density function 


=~ -1, 200<r< 220. 
fO = 599 z 


A fixed voltage of 5v is placed across the resistor. 


(a) Using the fact that V = / - R, find the probability density function of the current, 
I, through the resistor. 


(b) What is the expected value of the current? 
If X is uniformly distributed on [—1, 1], find the probability density function of Y = 
V1-X?. 
Let X be uniformly distributed on [01]. 


(a) Find the probability density function for Y = ral and prove that your result is 
a probability density function. 


(b) Explain how values of X could be used to sample from the distribution f(x) = 
2x, O<x< il. 


Random variable X has the probability density function f(x) = 2x, 0 <x < 1. Let 
y= =. Find E(Y) by 

(a) first finding g(y). 

(b) not using g(y). 
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32. 


33. 


34. 


35. 


36. 


37. 


38. 


39. 


40. 


41. 


42. 
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Given f(x) = 2e~*, for x > 0. Find the probability density function for Y = e~* and, 
from it, find E[e~*]. 


A random variable X has the probability density function f(x) = =x(3 —x), for0< 
x < 3. Find the probability density function for Y = X? — 1. 


The moment generating function for arandom variable Y is M[Y; t] = ae Expand 
ML[Y; t] in a power series in ¢ and find Hy and o,. 

Suppose that X is uniform on [—1, 2]. Find the probability density function for Y = 
|X|. 

The following data represent radiation readings, in milliroentgens per hour, taken 


from television display areas in different department stores: 0.40, 0.48, 0.60, 0.15, 
0.50, 0.80, 0.50, 0.36, 0.16, and 0.89. 


(a) Find a 95% confidence interval for y if it is known that c=. 
(b) Find a 95% confidence interval for yw if o2 is unknown. 


The variance of a normally distributed industrial measurement is known to be 225. 
If a random sample of 14 measurements is taken and the sample variance computed, 
what is the probability the sample variance is twice the true variance? 


Arandom sample of 21 observations is taken from a normal distribution with variance 
100. What is the probability the sample variance exceeds 140? 


A machine that produces ball bearings is sampled periodically. The mean diameter of 
the ball bearings produced is known to be under control, but the variability of these 
diameters is of concern. If the machine is working properly, the variance is 0.50 mm”. 
If a sample of 31 measurements shows a sample variance of 0.94 mm7, should the 
operator of the machine be concerned that something is wrong with the machine? 
Use a = 0.05. 


A manufacturer of piston rings for automobile engines assumes that piston ring diam- 
eter is approximately normally distributed. If a random sample of 15 rings has mean 
diameter 74.036 mm and sample standard deviation 0.008 mm, construct a 98% con- 
fidence interval for the true mean piston ring diameter. 


A commonly used method for determining the specific heat of iron has a standard 
deviation 0.0100. A new method of determination yielded a standard deviation of 
0.0086 based on nine test runs. Assuming a normal distribution, is there evidence at 
the 10% level that the new method reduces the standard deviation? 
(a) A random sample of 10 electric light bulbs is selected from a normal popula- 
tion. The standard deviation of the lifetimes of these bulbs is 120 hours. Find 
95% confidence limits for the variance of all such bulbs manufactured by the 
company. 
(b) Find 95% confidence limits for the standard deviation if the sample size is 100. 


A city draws a random sample of employees from its labor force of 5000 people. 
The number of years each employee has worked for the city is 8.2, 5.6, 4.7, 9.6, 7.8, 
9.1, 6.4, 4.2, 9.1 and 5.6. Assume that the time employees have been employed is 
approximately normal. Calculate a 90% confidence interval for the average number 
of years an employee has worked for the city. 


The number of ounces of liquid a soft drink machine dispenses into a bottle is a nor- 
mal random variable with unknown mean yp but known variance 0.25 oz”. A random 
sample of 75 bottles filled by this machine has mean 12.2 oz. 
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43. 


44, 


45. 


46. 


47. 


48. 


49. 


50. 


(a) Determine a 95% two-sided confidence interval for py. 
(b) It is desired to be 99% confident that the error in estimating the mean is less 
than 0.1 0z. What should the sample size be? 


The maximum acceptable level for exposure to microwave radiation in the United 
States is an average of 10 W/cm‘. It is feared that a large television transmitter 
may be polluting the air by exceeding a safe level of microwave radiation. 


(a) Test H,: = 10 against the alternative hypothesis H,: 4 > 10 with a = 0.05 if 
a sample of 36 readings gives a sample mean of 10.3 «W and a sample standard 
deviation of 2.1 wW. 


(b) Find a 98% confidence interval for y. 
A machine producing washers is found to produce washers whose variance is 30 in?. 
(a) A sample of 36 _washers is taken and the mean diameter, X is found. Find the 
probability that X is within 0.1 units of y, the true mean diameter. 
(b) How large a sample is necessary so that the probability X is within 0.2 units of 
His 0.90? 
(a) To determine with 94% confidence the average hardness of a large number of 


selenium-alloy ball bearings, how many would have to be tested to obtain an esti- 
mate within 0.009 units of the true mean hardness if o7 is known to be 0.0016? 

(b) Asmall study of five bearings in part (a) gives X = 2.057. What is the probability 
this differs from the true mean hardness by at least 0.009 units? 


Heat transfer coefficients of 65, 63, 60, 68, and 72 were observed in a sample of heat 
exchangers made by a company. Find a 95% confidence interval for the true average 
heat transfer coefficient, y, if 


(a) o? is known to be 17.64. 

(b) o? is unknown. 
Find the probability that a random sample of 25 observations from a normal popula- 
tion with variance 6 will have a sample variance between 3.100 and 10.750. 
The hardness (in degrees) of a certain rubber is claimed to be 65. A sample of 14 
specimens gave X = 63.1. 

(a) If o? is known to be 12.25 degrees? for this rubber can H_,,: « = 65 be accepted 

against the alternative hypothesis H,: uw 4 65 if a =5%? 

(b) Answer part (a) if the sample variance is 10.18 degrees’. 

A manufacturer of steel rods considers that the process is working properly if the mean 


length of the rods is 8.6 in. The standard deviation of these rods is approximately 
0.3 in. Suppose that when 36 rods were tested, the sample mean was 8.45. 


(a) Test the hypothesis that the average length is 8.6 in. against the alternative that 
it is less than 8.6 in., using a 5% level of significance. 


(b) Since short rods must be scrapped, it is extremely important to know when the 
process began to produce rods of mean length less than 8.6. Find the probability 
of a Type II error when the alternative hypothesis is H,: a = 8.4 in. 


A coffee vending machine is supposed to dispense 6 oz per cup. The machine is tested 
nine times yielding an average fill, X = 6.1 oz with standard deviation 0.15 oz. 


(a) Find a 90% confidence interval for 7, the true mean fill per cup. 
(b) Find a 90% confidence interval for o?, the true variance of the fill per cup. 
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56. 
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58. 


59. 


4.17 Quality Control Chart forX 281 


A random sample of 22 freshman mathematics SAT scores at a large university has 
sample mean 680 and standard deviation 35. Find a 99% confidence interval for y, 
the true population mean. 


A population has unknown mean y but known standard deviation of 5. How large a 
sample is necessary so that we can be 95% confident that X is within 1.5 units of the 
true mean? 


A fuel oil company claims that 20% of the homes in a city are heated by oil. Do we 
have reason to doubt this claim if 236 homes in a sample of 1000 homes are heated 
by oil? Use a = 1%. 

A brand of car battery claims that the standard deviation of the battery’s lifetime is 
0.9 years. If a random sample of 10 of these batteries has s = 1.2, test H,: o? = 0.81 
against the alternative hypothesis, H,: o7 > 0.81, if a = 0.05. 


A researcher is studying the weights of male college students. She wishes to test H,: 
w= 68 kg against the alternative hypothesis H,: ¢ # 68 kg. A sample of 64 students 
has X = 68.90 ands = 4 kg. 


(a) Is the hypothesis accepted or rejected? 
(b) Find f for the alternative u = 69.3 kg. 


Fractures in metals have been studied and it is thought that the rate at which fractures 
expand is normally distributed. A sample of 14 pieces of a particular steel gave X = 
3205 ft/s. 


(a) Find a 95% confidence interval for jz, the true average rate of expansion, if o is 
assumed to be 53 ft/s. 

(b) Now suppose o is unknown. The sample variance is 6686.53 (ft/s)”. Find a 95% 
confidence interval for py. 


Engineers think that a design change will improve the gasoline mileage of a certain 
brand of automobile. Previously such cars averaged 18 mpg. under test conditions. A 
sample of 15 cars has X = 19.5 mpg. 


(a) Test H,: « = 18 against the alternative hypothesis H,: 4“ > 18 assuming o? = 9 
and a = 5%. 


(b) Test the hypothesis in part a] at the 5% level if the sample variance is 7.4. 


One-hour carbon monoxide concentrations in 10 air samples from a city had mean 
11.5 ppm and variance 40 (ppm)*. After imposing smog control measures on a local 
industry, 12 air samples had mean 10 ppm and variance 43 (ppm). Estimate the true 
difference in average carbon monoxide concentrations in a 98% confidence interval. 
What assumptions are necessary for your answer to be valid? 
Specifications for a certain type of ribbon call for a mean breaking strength of 185 Ib. 
In order to monitor the process, a random sample of 30 pieces, selected from differ- 
ent rolls, is taken each hour and the sample mean used to decide if the mean breaking 
strength has shifted. The test then is of the hypothesis H,: 4 = 185 against the alter- 
native hypothesis H,: w < 185 with a = 0.05. Assuming o = 10 Ib, 

(a) Find the critical region in terms of x. 

(b) Find f for the alternative 4 = 179.5. 
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60. 


61. 


62. 


To test H,: = 46 against the alternative hypothesis H,: « > 46, a random sample 
of 24 is taken. The critical region is X > 51.7. 
(a) Find a. 
(b) Find f for the alternative uw = 48. 
In 16 test runs the gasoline consumption of an experimental engine had sample stan- 
dard deviation 2.2 gallons. Construct a 95% confidence interval for o, the true standard 
deviation of gasoline consumption of the engine. What assumptions are necessary for 
your analysis to be valid? 
A production supervisor wants to determine if changes in a production process reduce 
the amount of time necessary to complete a subassembly. Specifically she wishes 
to test H,: = 30 against the alternative hypothesis, H,:  < 30, with a = 5%. The 
measurements are in minutes. 
(a) Find the critical region for the test (in terms of X) if a sample of four times is 
taken and the true variance is assumed to be 1.2. 
(b) Now suppose a sample gave X = 29.06 and s? =1.44. Is the hypothesis accepted 
or not? 
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Chapter 5 


5.1 


Bivariate Probability Distributions 


INTRODUCTION 


So far we have studied a single random variable defined on the points of a sample space. 
Scientific investigations, however, most commonly involve several random variables aris- 
ing in the course of an investigation. A physicist, for example, may be interested in studying 
the effects of transmissions in a fiber optic cable when transmission rates and the composi- 
tion of the cable are varied; sample surveys usually ask several questions of the respondents 
creating separate random variables for each question; educators studying grade point aver- 
ages for college students find that these averages are dependent on intelligence, entrance 
examinations, rank in high school class, as well as many other factors that could be con- 
sidered. Each of these examples suggests a sample space on which more than one random 
variable is defined. 

While these variables could be considered individually as the univariate variables stud- 
ied in the previous chapters, studies of the individual random variables will provide no 
information at all on how the variables behave together. Separate studies then offer no 
information on how the variables interact or are correlated with each other; this is often 
crucial information in scientific investigations, since the manner in which the variables act 
together may indicate the most important factors in explaining the outcome. Because of 
this, investigations involving only one factor at a time are becoming increasingly rare. The 
interactions revealed in studies are often of greater importance than the effects of the indi- 
vidual variables alone, but measuring them requires that we consider combinations of the 
variables together. In this chapter, we will study jointly distributed random variables and 
some of their characteristics. This is an essential prelude to the actual measurement of the 
influence of separate variables and interactions. Inferences from these measurements are 
statistical problems that are normally discussed in texts on statistics. 


5.2 JOINT AND MARGINAL DISTRIBUTIONS 


Example 5.2.1 


In Example 2.9.1, we considered tossing two fair coins and recording X, the number of 
heads that occur. The coins that come up heads are put aside and only those that come 
up tails the first time are tossed again. Let Y denote the number of heads obtained in the 
second set of tosses. The variable Y is of primary interest here, but to investigate it we must 
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consider X as well. Although this might appear to be a purely theoretical exercise, the result 
is applicable when a number of components in a system fail according to a binomial model; 
interest centers on when all the components will fail, so our example is a generalization of 
this situation. We use five coins here and only two group tosses (so that it may be that not 
all the coins will turn up heads), but the extension to any number of coins is very similar to 
this special case. 

Y is clearly dependent on X. In fact, since 5 — X coins came up tails the first time and 
were then tossed again by a binomial process, it follows that if X = x, then the conditional 
probability that Y = y is given by a binomial probability: 


X itself is also a random variable and so the unconditional probability that X = x is 


P(X =x) = (?) ey x=0,1,...,5. 


If we call 


f(x,y) = P(X =x and Y=y), 


which we also denote as 
f@.y) = P(X =x, Y=y) 


the joint probability distribution of X and Y, then 
f@,y) = PX =x, Y =y) = PX =x)- PY =y|X =), 


where P(Y = y|X = x) is the conditional probability that Y = y if X = x. 
In this example, the conditional probability P(Y = y|X = x) is also binomial with 5 — x 
trials and probability of success at any trial 1/2, as we have seen, so 


765 @ (5) (3) -(Cy)GYG) rao, a ee es 


which can be simplified to 


5\ (5-x\ (1\? 
fons) = (3) ( : )G) x= 0,1, ...,5:y=0,1,..,5—%. 


These probabilities are exhibited in Table 5.1. 
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Table 5.1 Joint distribution for the coin tossing example 


Y 0 1 2 3 4 5 fx) 
X 
0 1 5 10 10 fe] 1 32 
024 1024 024 1024 1024 1024 1024 
1 10 40 60 40 10 0 160 
024 1024 024 1024 1024 1024 
40 120 120 40 320 
2 —- a ae — 0 0 = 
024 1024 024 1024 1024 
80 160 80 320 
3 — — — 0 0 0 — 
024 1024 024 1024 
80 80 160 
4 — — 0 0 0 0 — 
024 1024 1024 
32 32 
0) 0 0 — 
5 024 0 0 1024 
(y) 243 405 270 90 15 1 1 
80 024 1024 1024 1024 1024 1024 


Notice that the entries in the table must all be nonnegative (since they represent prob- 
abilities), and that the sum of these entries is 1. 
Probabilities can be found from the table. For example, 


P(X > 2, Y > 2) = 120/1024 + 80/1024 + 40/1024 = 15/64. 


A scatter plot of the joint probability distribution is also useful (see Figure 5.1). 


Figure 5.1 Scatter plot for the coin tossing example. 


Now suppose we want to recover information on the variables X and Y separately. 
These, individually, are random variables on their own. What are their probability distribu- 
tions? 

To find P(X = 3), for example, since the three events X = 3 and Y =0; X =3 and 
Y = 1; and X = 3 and Y = 2 are mutually exclusive, we see that 
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- ()(o)) + (3)(i)(3) + (5)(3) 
= (3)3) 0)+G)+@)} 
2 (3)(4) = (3)(3) = 10/32 = 5/16. 


We find this probability entered in the side margin of the table at X = 3. It is found by 


adding the probabilities across the row, thus considering all the possible values of Y when 
X=3. 


Other values of P(X = x) could be found in a similar manner and so we make the 
following definition. 


Definition: The marginal distribution of X is given by 


PX =x) =) PX =x,Y=y), 


where the sum is over all possible values of y. 


We denote P(X = x) by f(x). The term marginal distribution of X arises since the dis- 
tribution occurs in the margin of the table. 
So, 


f@=PXK=x)= pi Px =x%Y=y) 


where the sum is over all possible values of y. 
To find f(x) then in this example, we must calculate 


S-x 5-x = 
fo =Yfay= > (>) (° y ‘) (5) 
y=0 


y=0 


(6) £059)-Q0)" > 


so f0)=(2)(5) ¥=9, eae 


This verifies that X is binomial with n = 5 and p = 1/2. 


If we denote the marginal distribution of Y by g(y), then, reasoning in the same way as 
we did for f(x), we conclude that 


gy) = PY =y) = PPX =x, =y) 


where the sum is over all values of x. 
The functions f(x) and g(y) are given in the margins in Table 5.1. 
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In this case it is not so easy to see the pattern in the distribution of Y, but there is one. 
First, by the definition of g(y), 


5 B} —x 
P(Y =y)= a0) = Yi fey) = 2 (?) (° a (5) ; 
x=0 x=0 : 


and this can be written as 


=D) (EP)G)" 


x=0 


Now we remove common factors, rearrange, and insert the factor 1°--* to find that we can 
write g(y) as 


3 
g0) = (>) (5) 2 P ar. js8 


by the binomial theorem. It follows that 


gy) = @ : (4) 379, y=0,1,...,5. 


Some characteristics of the distribution of Y may be of interest. A graph of its values is 
shown in Figure 5.2. 


0.4 


0.3 


Probability 
i=] 
ho 


0.1 


¥ 
Figure 5.2. Marginal distribution for Y in Example 5.2.1. 


Finally, we find E(Y), the expected number of heads as a result of this experiment. A 
computer algebra system will evaluate E(Y) = yy - g(y) = 5/4. This also has an intuitive 
interpretation. One can argue that as a result of the first set of tosses, 5/2 coins are expected 
to be tails and of these 1/2 can be expected to result in heads on the second set of tosses, 
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producing 5/4 as E(Y). Note that by this argument, E(Y) was found without using the 
probability distribution for Y. It is often possible, and on occasion desirable, to do this. 
We will give this process more validity later in this chapter. Now we consider a continuous 
example. 


Example 5.2.2 


An investigator, intending to make a certain type of steel stronger, is examining the content 
of the steel. He considers adding carbon (X) and molybdenum (Y) to the steel and measuring 
the resulting strength. However, the carbon and molybdenum interact in a complex way in 
the steel being considered so the investigator takes some data by varying the values of X 
and Y (whose values have been coded here for convenience). He finds that the resulting 
strength of the steel can be approximated by the function 


fn =2 + (2) for 0<x<1 and O<y<1. 


A graph of this surface is shown in Figure 5.3. 


Figure 5.3 Surface for Example 5.2.2. 


We find that a 
| f(x, y)dydx = 1 and that f(x, y) > 0. 
0 Jo 


Because of these two facts, and in analogy with univariate probability densities, we call 
f(y) a continuous bivariate probability density function. Note again the distinction 
between discrete probability distributions and continuous probability densities. 

Rather than sum, as we did in the discrete example, we integrate to find the marginal 
densities. 
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We let 
fQ@) = / ff, y)dy 


and g(y) = / Sx, y)dx 


providing, of course, that the integrals exist. 
In this case, 


1 
fo = f (2 + 8/3)xy)dy =P +S, O0<x<l 
0 


and 
4y+1 


I 
gy) = | (0° + (8/3)xy)dx = , Ce yer. 
0 


Graphs of these probability densities are shown in Figures 5.4 and 5.5: 


1.5 4 


0.5 


= 


0 0.2 0.4 0.6 0.8 
Xx 


Figure 5.4 Marginal distribution for X, Example 5.2.2. 


1.6; 


0 0.2 0.4 0.6 
¥ 


Figure 5.5 Marginal distribution for Y, Example 5.2.2. 
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We can verify that each of these is a univariate probability density. In Example 5.2.1, 
we found E(Y) rather simply and without making use of the probability density for Y alone. 
In this case, it is easy to verify that E(Y) = 11/18, but it is not so easy to see how we would 
find this without finding g(y) first. We will show how this can be done in Section 5.4. 


Example 5.2.3 


Since the volume under the bivariate probability density function is 1, parts of that volume 
represent probabilities. In this example, we find that 


P(x>5.¥ ehh ( (e+ =x) ) ay v=. 


We can also compute more complex probabilities, such as P(X > Y). To calculate this we 
must integrate over the triangular region in the sample space where X > Y. This gives 


1 x 
pasy= fo [24 Say ay ax 
0 Jo 3 
Axy? |* 


WI 


EXERCISES 5.2 


1. Verify E[Y] in Example 5.2.2. 
2. Find Var[X] and Var[Y] in Example 5.2.2. 


3. An engineering college has made a study of the grade point averages of graduating 
engineers, denoted by the random variable Y. It is desired to study these as a function 
of high school grade point averages, denoted by the random variable X. The following 
table shows the joint probability distribution where the grade point averages have been 
combined into five categories for each variable. 


x 
2.0 2.5 3.0 3.5 4.0 
2.0 0.05 0 0.01 0 0 
Y 2.5 0.10 0.04 0 0.01 0 
3.0 0.02 0.10 0.05 0.10 0.01 
3.5 0 0 0.10 0.20 0.10 
4.0 0 0 0.05 0.02 0.05 


(a) Find the marginal distributions for X and Y. 
(b) Find E(X) and E(Y). 
(c) Find P(X > 3, Y > 3). 
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. A random sample of 6 items is drawn from a plant’s daily production. Let the ran- 


dom variables X and Y denote the number of good items and the number of defective 
items chosen, respectively. If the production contains 40 good and 10 defective items, 
find 

(a) the joint probability distribution function for X and Y. 

(b) the marginal distributions of X and Y. 


. Two cards are chosen without replacement from a deck of 52 cards. Let X denote the 


number of 3’s and Y denote the number of Kings that are drawn. 
(a) Find the joint probability distribution of X and Y. 
(b) Find the marginal distributions of X and Y. 


. Suppose that X and Y are continuous random variables with joint probability density 


function 
f@y=k, 1<x<2, 2<y<4, 


where k is a constant. (The random variables X and Y are said to have a joint uniform 
probability density.) 

(a) Find k. 

(b) Find the marginal densities for X and Y. 


. Suppose the joint (discrete) probability distribution function for discrete random vari- 


ables X and Y is 
PX =x,Y=y)=k, x=1,2,...,10; y=10—x, 11 — x, ... , 10. 


(a) Find k. 
(b) Find the marginal distributions for X and Y. 


. A researcher finds that two random variables of interest, X and Y, have joint probability 


density function 
f(xy) = 24xy, O<x<1, O<y<1l-x. 


(a) Show a graph of the joint probability density function. 
(b) Calculate the marginal densities. 
(c) Find P(X > 1/2, Y < 1/4). 


. A researcher is conducting a sample survey and is interested in a particular question 


that respondents answer “ yes” or “ no”. Suppose the probability a respondent answers 
“yes” is p and that respondents’ answers are independent. Let X denote the number of 
yeses in the first 1, trials and Y denote the number of yeses in the next n, trials. 


(a) Show the joint probability distribution function. 

(b) Find the probability distribution of the random variable X + Y. 

Refer to the previous problem. If, in the second set of trials, the probability of a “ yes” 
response has become p, # p, find the joint probability distribution function and the 
marginal distributions. Explain why the variable X + Y is not binomial. 

The number of telephone calls, X, that come into an office during a certain period of 
the day is distributed as a Poisson random variable with 4 = 6 per hour. The calls are 


answered according to a binomial process with p = 3/4. Let Y denote the number of 
calls answered. 
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12. 


13. 


14. 


15. 


16. 


17. 


18. 


19. 


(a) Find the joint probability distribution of X and Y. 
(b) Express P(Y = y) in simple form. 
(c) Find E(Y) without using the result in part (b). 


Suppose that random variables X and Y have joint probability density 
it x2 4y? 
f(y) = an 2, -0<x<0, -o<y<oo. 
I 


(a) Show that the marginal densities are normal. 

(b) Find P(X > Y). 

Three students are randomly selected from a group of three freshmen, two sophomores, 
and two juniors. Let X denote the number of freshmen selected and Y denote the number 
of sophomores selected. Find the joint probability distribution of X and Y. 


Random variables X and Y are jointly distributed random variables with f(x, y) = k, 
x= 01,2, «candy =0:-1,2.. s3 3 — x: 

(a) Find k. 

(b) Find the marginal densities for X and Y. 


Suppose that random variables X and Y have joint probability density f(x, y) = kxy on 
the region bounded by the curves y = x” and y = x in the first quadrant. 


(a) Show that k = 24. 
(b) Find the marginal densities f(x) and g(y). 


Let X and Y be random variables with joint probability density function f(x, y) = . 
O<y<x, 0<x<l. 


(a) Show that k = 1. 
(b) Find P (X > iYy< +); 
Random variables X and Y have joint probability density 


f(x y)=kx, x-1l<y<l-x, O0<x<l. 


(a) Find k. 
(b) Find g(y), the marginal density for Y. 
(c) Find Var(X). 


Suppose that random variables X and Y have joint probability distribution function 


f(xy) = ty), = 12.33 yS 1,2. 


(a) Find the marginal densities for X and Y. 
(b) Find P(X + Y < 3). 


A fair coin is flipped three times. Let Y be the total number of heads on the first two 
tosses, and let W be the total number of heads on the last two tosses. 


(a) Determine the joint probability distribution of W and Y. 
(b) Find the marginal distributions. 
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20. An environmental engineer measures the amount (by weight) of particulate pollution in 
air samples of a given volume collected over the smokestack of a coal-operated power 
plant. X denotes the amount of pollutant per sample collected when a cleaning device 
on the stack is not in operation, and Y denotes the same amount when the cleaning 
device is operating. It is known that the joint probability density function for X and Y 
is 

fay=k, O<x<2, O< y<il, x>2y. 


(a) Find k. 
(b) Find the marginal densities for X and Y. 


(c) Find the probability that the amount of pollutant with the cleaning device in oper- 
ation is at most 1/3 of the amount without the cleaning device in operation. 


21. Random variables X and Y have joint probability density function 


fauy)=k x20, y20, Ttysl. 


(a) Show that k = 7 


(b) Find P (x >2, Y> i)- 
(c) Find P(X < 1). 
22. A fair die is thrown once; let X denote the result. Then X fair coins are thrown; let Y 
denote the number of heads that occur. 
(a) Find an expression for P(Y = y). 
(b) Explain why E(Y) = 7/4. 


23. Suppose that X and Y are random variables whose joint probability density function is 
S(x,y) = 3y, 0< x<y<l. 


(a) Show that f(x, y) is a joint probability density function. 
(b) Find the marginal densities. 


xX] _ 2X] Y] _ AY, 
(c) Show that E | *] = Fal Does E| =| 


~ EX] * 

24. A coin is tossed until a head appears for the first time; denote the number of trials 
necessary by X. Then X of these coins are tossed; let Y denote the number of heads that 
appear. 

(a) Find the joint distribution of X and Y assuming that the coins are fair. 


(b) Find the marginal distributions of X and Y. (X is geometric; to simplify the distri- 
bution of Y, consider the binomial expansion (1 — x)~”.) 


(c) Show that E(Y) = 1 whether the coins are fair or not. 


(d) Find the marginal distribution for Y assuming that the coins are loaded to come up 
heads with probability p. 


5.3 CONDITIONAL DISTRIBUTIONS AND DENSITIES 


In Example 5.2.1 we tossed five coins and recorded X, the number of heads that appeared. 
We then tossed the 5 — X coins that came up tails again and recorded Y, the number of 
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heads in the second set of tosses. The joint probability distribution function is shown in 
Table 5.1. 

We might be interested in some conditional probabilities such as the probability that 
the second set of tosses showed at least two heads, given that one head appeared on the first 
toss or P(Y > 2|X = 1). 

We cannot look at the row for X = | and add the probabilities for Y > 2 since the 
probabilities in the row for X = 1 do not add up to 1, that is, the row for X = | is nota 
probability distribution. However, we know that 


P(Y = 2,X = 1) 


Ree SS a 


so a probability distribution can be created from the entries in the column for X = | by 
dividing each of them by P(X = 1). If we do this, we find 


60 + 40 + 10 
SA 1024 _ il 
PY > 2|X = 1) = 160 ie" 
1024 
We conclude generally that 
Peay gs pe = 20 
— —4 — 1 = : 
— PX =x) ‘ 


This clearly holds for the case of discrete random variables. We proceed in the same way 
for continuous random variables, leading to the following definition. 


Definition: The conditional probability distributions f(y|X = x) and f(x|Y = y), which 
we denote by f(y|x) and f(x|y) are defined as 


SY) 
OX =x) =fOla) = 
Ty FY FO) 
f@y) 
F@lY = y) =f@ly) = — 
gy) 
where f(x,y), f(x), and g(y) are the joint and marginal distributions for X and Y, respec- 
tively. 
Example 5.3.1 


In Example 5.2.2, we considered the joint probability density function of the continuous 
variables X and Y where 


8 
faya?+(Z)x-9, O0<x<1, O<y<l. 
The conditional densities can be seen geometrically as the intersections of the joint proba- 
bility density surface and vertical or horizontal planes. These curves of intersection are in 


general not probability densities since they do not have area 1, so they must be divided by 
the marginal densities to achieve this. 
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Since the marginal densities are f(x) =x? + =, O<x<1, and g(iy)= a 


0 < y < 1, it follows that 


3x + 8xy 
= ——_,, 0<x<l 
f(xly) ee x 


ge 2) x y 2 
3 3x" + 8x. 
fox) = asus 


e+(ij, re 


, O<y<il. 


The domain of each variable is denoted above; the remaining symbol is understood to be 
fixed. 


That each of these is a probability density can be verified by calculating that 
1 1 


fQly) dx=1and / f(|x)dy = | and noting that f(x|y) > 0 and f(y|x) > 0. 
Areas under these conditional densities are probabilities; for example, if Y = 3/4, then 
fOrly = 3/4) = 3? + 2x) so 


1 


P(X < 1/2|¥ = 3/4) = 3 [242% dx = 7/32. 
0 


The mean values of the conditional densities are also of interest. These are denoted as 
E(Y|X = x) and E(X|Y = y), respectively. We see that 


E(|X =x) = / y fla) dy and 


Bey =y)= [ x-foly dx. 


2,(8 
e+G)ey oxo 


24(4)s = teeoa 
=), 


1 
In this example, E(Y|X = x) = i y: 
0 


\ v4 (S)x-y 
3 2 
sxlr=»= [ x-— PM dite ae 


We note that E(X|Y = y) and E(Y|X = x) are functions of y and x, respectively. 


Example 5.3.2 


Finally, in this section we consider a slightly more complex continuous example. Let 


f@y) =2e°, x>0, y>x. 
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Y 


<----Y=X 
—_ S— 


Figure 5.6 Sample space for f(x, y) 
=2e*%, x>0, y>x. 


x 


y 


Figure 5.7 Probability surface for Example 5.3.2. 


Here, we must be cautious in determining the limits of integration. A picture of the sample 
space is shown in Figure 5.6. 


The bivariate function itself, shown in Figure 5.7, is also interesting: 
The marginal densities are as follows: 


f@= | 2e* dy = 2e->*, x >0 and 
0 


y 
gy) = i 2e “dx = 2e%1-—e”), y>0. 
0 
The conditional densities are then 


—X 


faly) = 


ey 


, O<x<y and 
fala) =e, y2x. 
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The reader might verify that each of these is a probability density. The conditional 
expectations are then 


y 


| 
E(X|Y=y)= —— and E(Y|X =x) =14x. 
oe 


EXERCISES 5.3 


1. In Example 5.3.2, verify that f(x|y) and f(y|x) are each probability densities. 
2. In Example 5.3.2, determine E[X|Y = y] and E[Y|X = x]. 


3. Suppose the joint probability density function of random variables X and Y is given by 
f@y=e-at+y), 0O<x<1, 0O<y<il. 


(a) Show that c = 1. 


(b) Find the marginal densities and the conditional densities, verifying in each case 
that these are probability densities. 


(c) Find E(Y|X = x) and E(X|Y = y). 


4. Suppose X and Y are discrete random variables with 


f@,y) = —_ x=1,2,3; y=1,2. 
21 
(a) Show that f(x, y) is a probability distribution function. 
(b) Find the conditional distributions. 
(c) Find E(Y|X = x) and E(X|Y = y). 
5. Let f(x, y) = kxy, 0<x< 1, 0< y<-.x for random variables X and Y. 

(a) Show that k = 8. 
(b) Find the marginal densities and the conditional densities. 


6. Random variables X and Y have joint probability density function 
f@y)=e-x-(2-x-y), O<x<lx<y<l. 


(a) Show that c = 8. 
(b) Find the marginal densities. 
(c) Verify that f(y|x) is a probability density. 
7. Suppose that random variables X and Y have joint probability density f(x,y) =k-x-y 
for x > 0 and y > 0 on the circle x* + y* < 1. 
(a) Find k. 
(b) Find E(Y|X = x) and then find E(Y) from this result. 
8. Let X and Y have joint probability distribution: f(0, 0) = 1/3, f(0,1) = 1/2, fd, ) = 
1/6. 
(a) Show that f(x, y) is a joint probability distribution. 
(b) Find the marginal and conditional distributions. 
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9. Random variables X and Y have f(x, y) = k-x-y, where the sample space is the finite 
area between y = x” and y = x. 


(a) Find E(Y|X = x) and E(X|Y = y). 
(b) Find P(X > 1/2|Y = 1/3). 
10. Suppose X and Y have joint probability density function 


A= s+y"), (2443, Ney<t 


(a) Find P (X > 1,yY< ). 


(b) Find f(y|x). 
(c) Evaluate P (¥ > 51x = 1) : 


11. Let X and Y be random variables with joint probability density function 
f(y) =3x4+1, 0<x<1, O<y<1-x. 


(a) Find the marginal densities. 
(b) Find f(y|x). 


12. Random variables X and Y have joint probability density function 
f(xy) =kex-y’, O<x<1, x<y<l. 


(a) Find k. 
(b) Find the marginal densities f(x) and g(y). 
(c) Find the conditional density f(y|x). 


5.4 EXPECTED VALUES AND THE CORRELATION 
COEFFICIENT 


Ifrandom variables X and Y have a joint probability density f(x, y), then, as we have seen, X 
and Y are univariate random variables and hence have means and variances. It is of course 
true that 


E(X) -{ x-f(x)dx and 
E(Y) = / y+ gy)dy, 
y 
but these values can also be found from the joint probability density as 


E(x) = [ [x-Fo» dy dx and 


a = f fy- fey dy dx. (5.1) 
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That these relationships are true can be easily established. Consider the above expres- 
sion for E(X) and factor out x from the inner integral. This gives 


E(X) = i) x: / Fess dx = / x f(xdx, 


so the formulas are equivalent. Formulas (5.1) show that the marginals are not needed, 
however, since the order of the integration can often be reversed and so the expectations 
can be found without finding the marginal densities. 

Now we turn to measuring the degree of dependence of one random variable and the 
other. The idea of independence is easy to anticipate: 


Definition: Random variables X and Y independent if and only if 
S(x,y) =f() + g(y) for all values of x and y, 


where f(x) and g(y) are the marginal distributions or densities. 


Usually, it is not necessary to consider the joint density of independent variables since 
probabilities can be calculated from the marginal densities. If X and Y are independent, 
then 


b d 
Ma<X<be<Y¥<d= / [ fessnayas 


b d 

-/ [ #09 -8oyayas 
b d 

= / f(x)dx - g(y) dy so 


Pia<X<b,c<Y¥<d)=P(a<X<b)-P(c<Y<d), 


showing that the joint density is not needed. 
Referring to Example 5.3.2, 


f(x,y) = 2" F 2e™ - 26 (1 — e) = f(x) - g(y), 


so X and Y are not independent. This raises the idea of measuring the extent of their depen- 
dence. In order to do this, we first define the covariance of random variables X and Y as 
follows: 


Definition: The covariance of random variables X and Y is 


Covariance(X, Y) = Cov(X, Y) = E[[X — E(X)][Y -— E(Y)]] (5.2) 
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As a special case, if X = Y, then Cov(X, Y) = Cov(X, X) and the formula becomes the 
variance of X, Var(X). But, unlike the variance, the covariance can be negative. Before cal- 
culating that, however, consider formula (5.2). By expanding it, we find that 


Cov(X, Y) = E[X- Y —X- E(Y)— Y- E(X) + E(X) - E(Y)] 
= E(X - Y) — E(X)- E(Y) — E(X) - E(Y) + E(X) - E(Y) 
Cov(X, Y) =E(X - Y) — E(X) - E(Y), 
a result that is often very useful. 
In the example we are considering, E(X - Y) = 1, E(X) = 1/2, and E(Y) = 3/2, so 
Cov(X, Y) = 1/4. 


The covariance is also used to define the correlation coefficient, p(x, y) as we do now. 


Definition: The correlation coefficient of the random variables X and Y is 


Covi, y) 


p(x, y) = 
Ox Oy 


where o, and o, are the standard deviations of Xand Y, respectively. 


In this example, we find that o, = 1/2 and Oo, = V5, so Cov(X, Y) = = = 0.447214. 


Now consider jointly distributed random variables X and Y. 


BX +Y)= / / (49) -f& yay de 


=[ [  teyayars | [ y-so.yayas Ne) 
x Jy xJy 


E(X+ Y) =E(X)+ E(Y), or 


The expectation of a sum is the sum of the expectations. 
It is also easy to see that if a and b are constants, 


E(aX + bY) = aE(X) + bE(Y), 
since the constants can be factored out of the integrals. The result can be easily generalized: 
E(aX + bY +cZ +--+) = aE(X)+ bE(Y) + cE(Z) +-:-- 


As might be expected, variances of sums are a bit more complicated than expectations of 
sums. We begin with Var(aX + bY). By definition this is 


Var(aX + bY) = E[aX + bY — E[aX + bY] 
= Ela[X — E(X)] + b[Y - E(Y) IP. 


Now squaring, factoring out the constants, and taking the expectation term by term, we find 


Var(aX + bY) = a*E[X — E(X)]° + 2abE[[X — E(X)][Y — E(Y)]] + P°ELY — EY)’, 
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and we recognize the terms in this expression as 
Var(aX + bY) = a’Var(X) + 2abCov(X, Y) + b?Var(Y). (5.3) 


So we cannot say, as we did with expectations, that the variance of a sum is the sum of the 
variances, but this would be true if the covariance were zero. When does this occur? If X 
and Y are independent, then 


E(X-Y)= I xyf (x, y)dy dx 


= [ | ¥-¥-109- @orayas 
= [=| [ ye0ra]-sooas 


E(X- Y)=E(Y) f xf (x)dx = E(X) - E(Y). 


Since E(X - Y) = E(X)- E(Y), Cov(X, Y) = 0. So we can say that if X and Y are indepen- 
dent, then 


Var(aX + bY) = a*Var(X) + b?Var(Y). 


But the converse of this assertion is false: that is, if Cov(X, Y) = 0, then X and Y are not 
necessarily independent. An example will establish this. Consider the joint distribution of 
X and ¥ as given in the following table: 


Y -1 0 1 
XxX 
-1 a b a 
0 0 b 
1 a b a 


We select a and b so that 4a + 4b = 1. Take a = 1/6 and b = 1/12 as an example among 
many choices that could be made. The symmetry in the table shows that E(X) = E(Y) = 0 
and that E(X - Y) = 0. So X and Y have Cov(X, Y) = 0. But PX = -1,Y=-1)=1/64 
(5/12) - (5/12) = 25/144, so X and Y are not independent. To take the more general 
case, 


P(X =-1,Y =-1)=a4¢(2a+b) 
so X and Y are not independent. 
If Cov(X, Y) = 0, we call X and Y uncorrelated. We conclude that if X and Y are 


independent, then they are uncorrelated, but uncorrelated variables are not necessarily inde- 
pendent. 
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Finally, in this section we establish a useful fact, namely, that the correlation coefficient, 
p, is always in the interval from —1 to 1: 


-1<pxy) <1. 


As a proof of this, consider variables X and Y that each have mean 0 and variance 1. (If 
this is not the case, transform the variables by subtracting their means and dividing by their 
standard deviations, producing X and Y.) 

Since the variance of any variable is nonnegative, 


Var(X — Y) > 0 or, by formula 5.3, 


Var(X — Y) = Var(X) — 2Cov(X, Y) + Var(Y). 
But Var(X) = Var(Y) = 1 and Cov(X, Y) = p, so 
1-2p+120, 


which implies that p < 1. 

The other half of the inequality can be established in a similar way by noting that 
Var(X + Y) > 0. 

The reader will be asked in problem 5 to show that the transformations done to 
insure that the variables having mean 0 and variance | do not affect the correlation 
coefficient. 


Example 5.4.1 


The fact that the expectation of a sum is the sum of the expectations and the fact that, if the 
summands are independent, then the variance of the sum is the sum of the variances, can be 
used to provide a neat derivation of the mean and variance of a binomial random variable. 

Suppose that the random variable X represents the number of successes when there are 
n independent trials of a binomial random variable with probability p of success at any trial. 
Now define the variables 


1 if the first trial is a success 
OQ otherwise 


x { if the second trial is a success 
5 = 


OQ otherwise 


x 1 if the nth trial is a success 
~ ) 0. otherwise. 


The xi s are often called indicator random variables. 
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Since X; is | only when a success occurs and is 0 when a failure occurs, 


n 
X=X,/+X)+---+X,= \)X, 
i=] 


Now 

E(X;) =1-p+0-(1-—p)=p and 

E(X?) = 1? -p+0?-(1-p) =p 
so 

E(X) = «(3x = > EX) = YP = np. 
i=l i=1 i=1 
Also, 
Var(X;) = E(X?) — [E(X))? = p - p? = p(l - p) = pa, 

SO 


Var(X) = Var (x x) = > Var(X;) = ye = np. 
i=l i=l 


i=1 


We have again established the formulas for the mean and variance of a binomial random 
variable. 


5.5 CONDITIONAL EXPECTATIONS 


Recall Example 5.2.1 of this chapter where five fair coins were tossed, the coins showing 
heads being put aside and those showing tails tossed again. We called the random variable 
X the number of coins showing heads on the first toss and the random variable Y the number 
of coins showing heads on the second set of tosses. E(Y) is of interest. 

First note that E(Y|X = x) and E(X|Y = y) are functions of x and y, respectively. If we 
let 


E(Y|X = x) = K(x), 


then we could consider the function k(X) of the random variable X. This is itself a random 
variable. We denote this random variable by E(Y|X), so 


E(Y|X) = k(X). 


We now return to our example. Since there are 5 — x coins to be tossed the second time and 
the probability of success is 1/2, it follows that E(Y|X = x) = (5 — x)/2. This conditional 
expectation E(Y|X) is a function of X. It also has an expectation which is 


_p[S2X]_5_ FO) _5_ 
E[E(Y|X)] =E [>>| =: =; 


5/2 
2 2 2 


= >, which we found as E(Y). 
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So we conjecture that ELE(Y|X)] = E(Y) and that E[E(X|Y)] = E(x). 

We now justify this process of establishing unconditional expectations based on con- 
ditional expectations. 

It is essential to note here that E(Y|X) is a function of X; its expectation is found using 
the marginal distribution of X. Similarly, E(X|Y) is a function of Y, and its expectation is 
found using the marginal distribution of Y. 

We will give a proof that EE(Y|X) = E(Y) using a continuous bivariate distribution, 


say f(x, y). 
First we note that 


E(E(Y|X)) = / E(YIX) -f(a) dx 


x 


= FQY) AY. 
= / : y FO) a f(x) dx 


-// y-f(%,y) dy dx = E(Y). 


The proof that E(E(X|Y) = E(X) is of course similar. 


Example 5.5.1 


We apply these results to Example 5.3.2. In this example, f(x, y) = 2e”, x => 0, y = x. We 
found that f(y|x) = e*’, y > x, and that E(Y|X) = 14+ X. 
Now we calculate 


E(Y) -{ 2-y-e%-(1-—e”)dy = 3/2 
0 


Also . 
E[E(Y|X)] = EU +X) = i (143) -2- ede. 
0 


This integral, as the reader can easily check, is also 3/2. 


Example 5.5.2 


An observation, X, is taken from a uniform density on (0, 1), then observations are taken 
until the result exceeds X. Call this observation Y. What is the expected value of Y? 

It appears obvious that, on average, the first observation is 1/2. Then Y can be consid- 
ered to be a uniform variable on the interval (1/2,1), so its expectation is 3/4. Let us make 
these calculations more formal so that the technique could be applied in a less obvious 


situation. 
We have that X is uniform on (0,1) so that f(x) = 1,0 < x < 1 and Y is uniform on (x, 1) 
so that 1 
fl) = ig rs lL. 
—x 
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The joint density is then 


1 


l1-x 


fy) =f) - fol» = 


,x<y<l, 0<x<il 


The marginal density for Y is then 


. 
eo= | ; dx =—In(l—y), 0<y<1, and 
0 l-x 


1 
a= [ -yln(l—y) dy= =. 
0 


verifying our earlier informal result. 


EXERCISES 5.5 


. In Example 5.3.2, find that Cov(X, Y) = 1/5. 
. In Example 5.3.2, verify that E[Y] = 3/8. 
. Show that p(X, Y) in Example 5.2.1 is =i 7s. 


. Find the correlation coefficient between X and Y for the probability density: 


we NN = 


f@y)=xty, O<x<1, O<y<l. 


. Show that Cov(aX + bY, cX + dY) = acVar(X) + (ad + bc)Cov(X, Y) + bdVar(Y). 
. Show that Cov(X — Y,X + Y) = 02-02 


ys 


. Show that p(aX + b, cY + d) = p(X, Y), provided that a > 0 and b > 0. 
. LetfQx,y)=k,0<x<2,0<y<1,x22y. 
(a) Are X and Y independent? 
(b) Find P(Y < X/3). 
9. Let f(x,y) = — (7+ ao), O0<x<2;-2<y<2. 

(a) Show that f(x, y) is a joint probability density function. 
(b) Show that X and Y are independent. 

10. Let f@, y) = (2) Q +y*), -l<x<ls-l<y<l. 

(a) Verify that f(x, y) is a joint probability density function. 

(b) Find the marginal densities, f(x) and g(y). 

(c) Find the conditional densities, f(x|y) and f(y|x). 

(d) Verify that E[E(X|Y)] = E(x). 

(e) Find P (x >i\V= ;)- 


oN N MN 


11. Let f@,y) = 1+ 5 = ay, -1/2<x<1/2;-1/2<y< 1/2. 
(a) Show that f(x, y) is a joint probability density function. 
(b) Show that f(x|y) = f(@, y). 

12. Let f(x,y) =k-x? -(8-y), x<y<2x, O<x<2. 
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(a) Find the marginal and conditional densities. 
(b) Find E(Y) from the marginal density of Y and then verify that ELE(Y|X)] = E(Y). 
13. Random variables X and Y have joint probability density function 


faye, 0<x2<2, x <y <4, 


Lu 


(a) Show that k = z 


(b) Find f(xly). 
(c) Find E(X|Y = y). 
14. Suppose that the joint probability density function for random variables X and Y is 


f(x,y) = 2x+ 2y—4xy, OS x<1, O<yK<1. 


Are X and Y independent? Why or why not? 
15. The joint probability density function for random variables X and Y is 


fy=k, -l<x<l, e<y<l 


Find the conditional densities f(x|y) and f(y|x) and show that each of these is a proba- 
bility density function. 


16. Suppose that random variables X and Y have joint probability density function f(x, y) = 
kx(2-—x-y), 0<y<1, 0<x<-y. Find f(|y) and show that this is a probability 
density function. 


17. Random variables X and Y have joint probability distribution function 
f@,y) = * x=1,2,3 and y=1,2,... ,4-x. 


(a) Find formulas for f(x) and g(y), the marginal densities. 
(b) Find a formula for f(y|x) and explain why the result is a probability distribution 
function. 
18. Find the correlation coefficient between X and Y if their joint probability density func- 
tion is 
fay=k, O<x<y, O<yK<l. 


19. Random variables X and Y are discrete with joint distribution given by 


y 
Oo 1 

1 
re 
; & 
3 £46 


Find the correlation coefficient between X and Y. 
20. Random variables X and Y have joint probability density function f(x, y) = -, x + 
y* < 1. Are X and Y independent? 
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28. 


29. 
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Let random variables X and Y have joint probability density function f(x, y) = kxy on 
the region bounded 0 <x <2, 0<y<wx. 


(a) Show that k = 1/2. 

(b) Find the marginal densities f(x) and g(y). 

(c) Are X and Y independent? 

(d) Find P (x >= 1). 

Random variables X and Y are uniformly distributed on the region x* <y <1, 0< 
x < 1. Verify that E(Y|X) = E(Y). 

Suppose that X and Y are random variables with o2 = 10, Oo, = 20, and p= *. Find 
Var(2X — 3Y). 

Let X,,X,, and X3 be random variables with E(X;) = 0 and Var(x;)= 1, i= 1,2,3. 
Also, Cov(X;,X;) = —4 if i#j. Find Var (os iX;). 


Let X,,X>,X3, ... ,X,, be uncorrelated random variables with common variance o7. 


Show that X = Diet X; and X; — X are uncorrelated, j = 1,2, ... ,n. 


A box contains five red and two green marbles. Two marbles are drawn out, the first 
not being replaced before the second is drawn. Let X be | if the first marble is red and 
0 otherwise; Y is 1 if the second marble is red and is 0 otherwise. Find the correlation 
coefficient between X and Y. 


In four tosses of a fair coin, let X be the number of heads and Y be the length of the 
longest run of heads (a sequence of tosses of heads). For example, in the sequence 
HTHH, X = 3 and Y = 2. Find the correlation coefficient between X and Y. 


An observation, X, is taken from the exponential distribution, f(x) = e~*, x > 0. Sam- 
pling then continues until an observation, Y, is at most X. Show that E(Y) = 2 — oe 


xe72* 


ioe) 
[Hint: Expand the integral in a power series to show that | —— 
9 ie 


1 1 
d= +37 + 


1 

rv) 4+... -] 
Variances as well as expected values can be found by conditioning. Consider the for- 
mula 


Var(X) = E[Var(X|¥)] + Var[E(X|Y)]. 


(a) Verify that the formula gives the correct result for the distribution in Example 
3.3.2, 


flx,y) =2e", x20, y2x 


(b) Prove that the formula is correct in general. 


The fair wheel is spun once and the result, X, is recorded. Then the wheel is spun again 
until the result is less than X. Call the second result Y. 


(a) Find the joint probability density for X and Y. 
(b) Find E(X|Y) and E(Y|X). 

1 
(c) ELE(YIX)] = 3. 
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5.6 BIVARIATE NORMAL DENSITIES 


The bivariate extension of the normal density is a very important example of a bivariate 
density. We study this density and some of its applications in this section. 
We say that X and Y have a bivariate normal density if 


2 2 

— Wh, — Wy Y— Hy yY— By 

se) Gs aaa) ee Ges 
2(1-9?) a Gy Oy Oy 


for —co < x < co and —oo < y< ov. Note that there are five parameters, the means and 
variances of each variable as well as p, and the correlation coefficient between X and Y. 

A graph of a typical bivariate normal surface is shown in Figure 5.8. The surface is a 
standard bivariate normal surface since X and Y each have mean 0 and variance | and p 
has been taken to be 0. 

As one might expect, the marginal and conditional densities (and indeed the intersec- 
tion of the surface with any plane perpendicular to the X, Y plane) are normal densities. 
We now establish some of these facts. For convenience, and without any loss of generality, 
we consider a bivariate normal surface with yw, = 4, = 0 and o, = 0, = 1. We begin witha 
proof that the volume under the surface is 1. : 

The function we are considering is now 


1 _ (2 —2pxyty?) 
fy) =————e 4) 


2nvV1— p? 


—-o<x<o and —w<y<o. 


Completing the square in the exponent gives 


1 _ (py? 1.9) 
f(x,y) = ————e 
2nvV/1— p 


\N 
m\ 
x) 

IN NN 
ah 


LPAI 
TLE araeee 
LD £2 


Figure 5.8 Normal probability surface, p = 0. 
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So 


/ / f(y) dy dx 


_@=pyy 
e 21-1") ax : ea” dy. 


[A eS i 


The inner integral represents the area under a normal curve with mean py and variance 
1 — p* and so is 1. The outer integral is then the area under a standard normal curve and so 
is | also, showing that f(x, y) has volume 1. 

This also shows that the marginal density for Y is 


1 —_}!,y2 
ay) = ev , -w<y<o, 
V2a 
with a similar result for f(x). 
wo fOGY) : 
Finding ma above gives 
(x — py)? 
fly) = —_1__, %A1-»), 


V2aV1— 


which is a normal curve with mean py and standard deviation 1/1 — p?. 

Now let us return to the general case where the density is not a standard bivariate 
normal density. It is easy to show that the marginals are now N(y,,0,) and N' (Hy, om) while 
the conditional densities are 


05, 
FoR) = Ny +e (= a) 0, VTP | 


and 


f@ly) =N E + p= (y— Hy) ,o,V1- P| 


y 


The expected value of Y given X = x, E(Y|X = x), is called the regression of Y on X. Here, 
we find that 


o, 
E(Y|X = x) = wy + p—(x — M,),a straight line. 
; o, 


If p = O then note that f(x, y) = f(x) - g(y), so that X and Y are independent. In this case, 
it is probably easiest to use the individual marginal densities in finding probabilities. If 
p # 0, it is probably best to standardize the variables before calculating probabilities. Some 
computer algebra systems can calculate bivariate probabilities. 

In Figures 5.9 and 5.10, we show two graphs to indicate the changes that result in the 
shape of the surface when the correlation coefficient varies. 
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Figure 5.10 Normal bivariate surface, p = 0.9. 


Contour Plots 


Contours or level curves of a surface show the location of points for which the function, or 
height of the surface, takes on constant values. The contours are then slices of the surface 
with planes parallel to the X, Y plane. 


If p = 0, we expect the contours to be circles, as shown in Figure 5.11. 
If p = 0.9, however, the contours become ellipses as shown in Figure 5.12. 
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3 2 1 0 1 2 3 


Figure 5.11 Circular contours for a normal probability surface, p = 0. 


3 2 1 0 1 2 3 


Figure 5.12 Elliptical contours for a normal probability surface, p = 0.9. 
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EXERCISES 5.6 


1. Let X and Y have a standard bivariate normal density with p = 0.6. 

(a) Show that the marginal densities are normal. 
(b) Show that the conditional densities are normal. 
(c) Calculate P(-2 < X< 1, O< Y <2). 

2. Height (X) and intelligence (Y) are presumed to be random variables that have a slight 
positive correlation coefficient. Suppose that these characteristics for a group of peo- 
ple are distributed according to a bivariate normal curve with pv, = 67,0, = 4”, Wy = 
114, 0, = 10, and p = 0.20. 

(a) Find P(66 < X < 70, 107 < Y < 123). 


(b) Find the probability that a person whose height is 5’7” has an intelligence quotient 
of at least 121. 


(c) Find the regression of Y on X. 


3. Show that p is the correlation coefficient between X and Y when these variables have 
a bivariate normal density. 


4. The guidance system for a missile is being tested. The aiming point of the missile (X, Y) 
is presumed to be a bivariate normal density with 4, = 0, 0, = 1, vw, = 0,0, = 6, and 
p = 0.42. Find the probability the missile lands within 2 units of the origin. 


5. Show that uncorrelated bivariate normal random variables are independent. 


5.7 FUNCTIONS OF RANDOM VARIABLES 


Joint distributions can be used to establish the probability densities of sums of random 
variables and can also be utilized to find the probability densities of products and quotients. 
We show how this is done through examples. 


Example 5.7.1 


Consider again two independent variables, X and Y, each uniformly distributed on (0,1). 
The joint density is then f(x,y) = 1, O<x<1LO<y<l. 
If Z =X +, then the distribution function of Z,G(z) can be found by calculating 
volumes beneath the joint density. The diagram in Figure 5.13 will help in doing this. 
Computing volumes under the joint density function, we find that 


(a) G@) = PX HY <= 52 ‘Ole S1-and 
(b) GO = 1-311-@- DP #12223, 


It follows from this that 


Zz 0<z<l 
g(Z) = 
2-z 1<z<2. 


This gives the triangular density we have seen previously. 
The technique is feasible for more difficult densities as the next example shows. 
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Figure 5.13 Jointly distributed uniform 
0 x 1 variables. 


Example 5.7.2 


Suppose X and Y are each independently exponentially distributed with f(x) = Ae~**, 
x > 0, with a similar distribution for Y. The distribution function for Z = X + Y can be 
found by considering first the sample space shown in Figure 5.14. 


Figure 5.14 Sample space for two indepen- 
0 Xx 1 dent exponential variables. 
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G(z) = f : | . eA) dy dx, 
0 Jo 


G2) =1-e*% - Aze, 


Now 


which can be found to be 
so that 
ao =A ze ™, 2>0. 


This is the gamma density we have seen before. 
This technique will work nicely on other random variables such as sums or quotients. 
An example of each follows. 


Example 5.7.3 


Let X and Y be independently distributed uniform random variables on (0,1) and consider 
the product, Z = X - Y. Figure 5.15 will help in seeing that the appropriate volume under 


the joint density gives 
ae 
G@=ct f / 1 dy dx, 
Zz 0 


G(z) =z—zlnz 


which is found to be 


so that 
a(z)=—-Inz, 0O<z< 1. 


1 (z,1) 


Figure 5.15 Sample space for the product of 
0 x 1 two uniform random variables. 
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Y 

Figure 5.16 Sample space for the quotient 
0 x 1 of two uniform random variables. 
Example 5.7.4 


As a final example in this section, we consider the quotient of two independently distributed 
uniform variables, Z = X/Y. Figure 5.16 shows the relevant geometry. 

Separating the two cases here and noting that Z = . > z above the line Y = X/z, we 
have 


Xx 
1 a 
(a) aa =1- f [Pi aani-Z 1<z<oo and 
0 Jo 2z 


iz 1 
(b) ae = | | dydx= 5, O<z<l. 
0 


x 
2 


This gives the density of the quotient as 


g(Z) = 


EXERCISES 5.7 


1. Find the probability density for Z = X/Y if X has an exponential density with mean 1 
and Y has an independent exponential density with mean 1/2. 


2. Suppose X and Y are independent observations from f(x) = 2x, 0 <x < 1. Find the 
probability density for Z = X - Y. 


www.it-ebooks.info 


316 Chapter5 Bivariate Probability Distributions 


3. Find the probability density for Z = X + Y where X is an observation from f(x) = 
2x, 0 <x < l,and Y is an independent observation from g(y) = 2(1—y), O< y< 1. 


4. Find the probability density for Z = X - Y where X and Y are independent observations 
from f(x) = 327, 0<x< 1. 


CHAPTER REVIEW 


In this chapter, we study random variables that are defined on the same sample space and 
consider their joint probability distributions or joint probability densities 


f(x,y) =P(X =x,Y =y) if X and Y are discrete and 


d b 
Pia<X<b, c<Y<d) =i f(x,y) dy dx if X and Y are continuous. 
io a 


Our study has been confined to two random variables producing bivariate probability dis- 
tributions or densities, although the techniques used in this chapter can often be extended 
to three or more random variables. 

X and Y are random variables and their individual densities are called marginal densi- 
ties. In the continuous case, denoting these by f(x) and g(y), respectively, 


oe | fls.y) dy and 
gly) = il Sy) dx. 


Appropriate sums are used in the discrete case. 

It is best to think of f(x, y) geometrically and to plot the surface. Slices of the surface 
for which X or Y are constant produce curves which, because their areas are not usually 
one, are rarely probability densities. Conditional densities can be produced, however, by 
dividing the conditional density by the area under the curve: 


jeep seyS = ad foe 
fe) 8(y) 


Bivariate distributions or densities, like univariate distributions or densities, are summarized 
by means and variances. However, unlike univariate distributions or densities, means and 
variances for bivariate distributions or densities can be calculated in more than one way. 


For example, 
Bu) = | x- f(x) a= | / x- f(x,y) dy dx 


with a similar result for E(Y). 
Functions of random variables are of interest, sums being of particular importance. We 
noted that 


E(X + Y) = E(X) + E(Y) 
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for any random variables X and Y. However, a direct calculation shows that 


Var(X + Y) = Var(X) + 2Cov(X, Y) + Var(Y) where 
Cov(X, Y) = E([X — E(X)|[Y — E(Y)]) denotes the covariance of X and Y. 


If X and Y are independent, then Var(X + Y) = Var(X) + Var(Y). If X and Y are not inde- 
pendent, then they can be partially dependent on one another, or totally dependent on one 
another. The degree of dependence can be measured by the correlation coefficient, p. 


Cov(X, Y) 
p=———. 
0,0 


It is known that -1 <p <1. 
We showed that for the random variables E(X|Y) and E(Y|X) that 


E(E(X|Y)) = E(X) and 
E(E(¥|X)) = E(Y). 


Normal bivariate densities comprise an important family of bivariate densities. Their prob- 
ability densities in standard form can be written as 


1 _ @2—2pxyty?) 
f(y) = ——_,e 0-0") , -~ <x< 0, -0 <y<o. 


2x1 — p? 


Finally, we considered products and quotients of random variables, finding the distribution 
functions of these functions of two random variables; by differentiating the result we found 
their respective probability density functions. 


PROBLEMS FOR REVIEW 


Exercises 5.2 - #2, 4,5, 7,8 
Exercises 5.3 - #1, 2,5, 7 
Exercises 5.5 - #2, 6, 9, 10, 13 
Exercises 5.6 - #1, 2 
Exercises 5.7 - #2, 3 


SUPPLEMENTARY EXERCISES FOR CHAPTER 5 
1. For the joint probability density function f(x, y) = k(x? + y*), -2<x<2,-2<y< 


2, 
(a) Find k. 
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10. 


11. 


(b) Are X and Y independent? 
(c) Find the marginal and conditional densities. 
. Given f(x, y) = ksin(x) cos(yje*, O0<x< 2/2,-2/2<y< 2/2. 
(a) Find k. 
(b) Find the marginal and conditional densities. 
. Two fair dice are thrown. Let X denote the largest face that appears and Y denote the 
smallest face that appears. 
(a) Find the joint probability distribution. 
(b) Find the conditional distribution f(x|y) and show that E(X|Y = y) = 
by Deen 5 O: 
Then use this result to find E(X). 


2 
(c) Show that E(Y|X = x) = aa x= 1,2, ... ,6 and use this result to find E(Y). 


(d) Explain why E(X + Y) = 7. 
. Given g(x, y) = (4) Gx-2v+, 1<x<3, O<y<l. 


42-2 - 
13-2y’ - 


(a) Find the marginal densities. 
(b) Find the conditional densities. 
(c) Verify that E[ELX|Y]] = E[X] and that ELE[Y|X]] = E[Y]. 
. Suppose that f(x) = 1, O0<x< land g(y) = 2y, 0<y< 1. Find P(Y < X) if X and 
Y are independent. 
. Consider the joint probability density function f(x,y) = kx, x-l<y<1l-x, 0< 
x < 1, for random variables X and Y. 
(a) Show that k = 3. 
(b) Find the marginal density of Y. 
(c) Find the variance of X. 


. Suppose f(x, y) = k is a joint probability density function on the area in the first quad- 
rant between the curves y = 4 and y = x”. 


(a) Find k. 
(b) Find E(X|Y = y). 

. Random variables X and Y have g(y) = A*ye~?”, y > O and f(xly) = 1/y, O<x<y. 
(a) Find the marginal density for X. 
(b) Find E[Y|X = x]. 

. X is an observation from f(x) = 2x, 0 < x < 1 while Y is an observation from the same 
density but Y > X. Find E(Y). 


Show that random variables X and Y are independent of the joint probability density 
function 


flx.y) =k (4e— 27 = 20? + (3) Gw?), O<x<2, O<y<1. 
X+y 


For what value of kis f(x, y) = ke 3 a joint probability density function for0 < x < 1 
and0<y<1? 


. X and Y have joint probability density function f(x, y) = k,0<x<2, °<y<4. 
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15. 


16. 


17. 


18. 
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(a) Show that k= +. 

(b) Find the marginal densities f(x) and g(y). 

(c) Find f(x|y). 

(d) Find E(X|Y = y). 

Random variables X and Y have g(y|X = x) = = 0<y< 2x and f(x) = 24x?, O< 


ee 

(a) Find the joint probability density function, f(x, y). 

(b) Find g(y), the marginal density for Y. 

(c) Are X and Y independent? Justify your answer. 

Random variables X and Y have E(X) = —5, E(Y) = 8, E(X’) = 100, E(Y?) = 364, 
and Cov(X, Y) = 100. Find p. 


The joint probability density function of X and Y is given by 
f@y) =ke(x-y), O<x<1, -x<y<x. 


(a) Find k. 

(b) Are X and Y independent? Support your answer. 

(c) Find P (x > iY i). 

A lot of television sets contain, unknown to the manufacturer, three with defective 
picture tubes, four with defective sound systems, and five that have no defective parts. 
Three of the sets are selected, without replacement. Let X denote the number in the sam- 
ple with defective picture tubes and Y denote the number in the sample with defective 
sound systems. 

(a) Find the joint probability distribution of X and Y. 

(b) Find the marginal distributions. 

(c) Find the mean and variance of X. 

(d) Find Cov(X, Y). 

(e) Find p. 

(f) Find Var(X — Y). 

Random variables X and Y have the following characteristics: E(X) = 3, Var(X) = 10, 
E(Y) = 2, Var(Y) = 30, E(X-Y)=4. 

(a) Find E(X?). 

(b) Find Cov(X, Y). 

(c) Find py y. 

(d) Let Z = 3X — 2. Find py 7. 

(e) Find Var(5X + 1). 

(f) Find Var(X — Y). 

Suppose that a fair coin is tossed three times. Let X denote the total number of heads 
and let Y be the number of heads on just the first toss. 

(a) Find the joint probability distribution of X and Y. 

(b) Calculate P(X > 1|Y = 0). 
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19. 


20. 


21. 


22. 


23. 


(c) Find the marginal distributions of X and Y. 
(d) Calculate the correlation coefficient. 


Let R be the region bounded by 0 < x < 1 and0 < y < x’, and let the joint probability 
density function of X and Y be given by 


f@y)= te if(x,y) is in R 


0 otherwise. 


(a) Determine k. 
(b) Find the marginal densities, f(x) and g(y). 
(c) Are X and Y independent? 


(d) Calculate P (x >iy> i)- 


Consider a box that has five white marbles and two yellow marbles. A marble is drawn 

out and not replaced, and then a second marble is chosen. Random variable X is | if 

the first marble is yellow and 0 otherwise. Random variable Y is | if the second marble 

is yellow and is 0 otherwise. 

(a) What is the joint probability distribution of X and Y? 

(b) Find the correlation coefficient. 

(c) Are X and Y independent? Explain. 

(d) Find Var(X + Y). 

In four tosses of a fair coin let X denote the number of heads and Y denote the longest 

run of heads. (A run of heads is a sequence of successive heads.) 

(a) Show the joint probability distribution of X and Y in a table and determine the 
marginal distribution of Y. 

(b) Find P(X = 2|Y < 2). 

(c) Find the expected value of X. 

Suppose X and Y are independent random variables with marginal densities f(x) = 

Ae~*, x >Oand g(y) = Ae~*, y>0. 

(a) Find the joint probability density function of X and Y. 

(b) Find E(X - Y). 

(c) Find P(Y > 2X). 


Let X and Y be random variables with probability density function 


fe,y) = , Uses t, Cey< 
Vy 
(a) Find k. 


(b) Find the marginal densities. 
(c) Find P (v > *). 
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24. Let random variables X and Y denote the temperature and time in minutes it takes a 
diesel engine to start, respectively. The joint density for X and Y is 


fy) =c(4xt+ 2yt+1), O<x<4, O<y<2 


(a) Find c. 
(b) Find the marginal densities and decide whether or not X and Y are independent. 
(c) Find f@|Y = 1). 
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Chapter 6 


6.1 


Recursions and Markov Chains 


INTRODUCTION 


In this chapter, we study two important parts of the theory of probability: recursions and 
Markov chains. 

It happens that many interesting probability problems can be posed in a recursive man- 
ner; indeed, it is often most natural to consider some problems in this way. While we have 
seen several recursive functions in our previous work, we have not established their solu- 
tion. Some of our examples will be recalled here and we will show the solution of some 
recursions. We will also show some new problems and solve them using recursive func- 
tions. In particular, we will devote some time to a class of probability problems known as 
waiting time problems. 

Finally, we consider some of the theory of Markov chains, which arise in a number 
of practical situations. This theory is quite extensive, so we are able to give only a brief 
introduction in this book. 


6.2 SOME RECURSIONS AND THEIR SOLUTIONS 


322 


We return now to problems involving recursions, considerably expanding our work in this 
area and considering more complex problems than we have previously. 

Consider again the simple problem of counting the number of permutations of n distinct 
objects, letting P,, denote the number of permutations of these n distinct objects. P,, is, of 
course, a function of n. Now if we were to permute n — | of these objects, a new object, the 
nth, could be placed in any one of the n — 2 positions between the objects or in one of the 
two end positions, a total of n possible positions for the nth object. For a given permutation 
of the n — | objects, each one of the n choices for the nth object gives a distinct permutation. 
This reasoning shows that 


n>1. (6.1) 


Formula (6.1) expresses one value of a function, P,,, in terms of another value of the same 
function, P,,_,. For this reason, (6.1) is called a recursion or recurrence relation or difference 
equation. 

We have encountered recursions several times previously in this book. Recall that in 
Chapter 1, we observed that the number of combinations of n distinct objects taken r at a 
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time, (") , could be characterized by the recursion 


n\ _n-rt+l n _ 
(7)-Ht(), ta ep (6.2) 


We saw in Chapter 2 that values of the binomial probability distribution, where 


P(X =x)= (") Dg B20 1s 


are related by the recursion 


n-X 


PX =x)= -PX=x-1),, x=1,2,...,n. 


x+1 
This recursion was used to find the mean and the variance of a binomial random variable. 
One of the primary values of a recursion is that, given a starting point, any value of the 
function can be calculated. In the example regarding the permutations of n distinct objects, 
it would be natural to let P; = 1. Then we find, by repeatedly applying the recursion, that 


Py =2P,= 2° 1=2, 
P; =3P,=3-2-1=6, 
Py=4P, =4-3+2-1=2A, 


and so on. It is easy to conjecture that the general solution of the recursion (6.1) is P, = n!. 
The conjecture can be proved to be correct by showing that it satisfies the original recursion. 
To do this, we check that 


P,=n-P,, giving 


n!=n-(n-1)!, 


which is true. So we have found a solution for the recursion. ; ; 
In this example, it is easy to see a pattern arising from some specific cases and this 


led us to the general solution; soon we will require a specific procedure for determining 
solutions for recursions where specific cases do not provide a hint of the general solution. 


As another example, we saw in Chapter | that the solution for Equation (6.2) is (7) = 
aot where a starting point is (6) =I, 

Solutions for recursions are, however, not always so simple. 

We want to abandon purely combinatorial examples now and turn our attention to recur- 
sions that arise in connection with problems in probability. It happens that many interesting 
problems can be described by recursions; we will show several of these, together with the 
solutions of these difference equations. 


We begin with an example. 


Example 6.2.1 


A quality control inspector, thinking that he might make his work a bit easier, decides on the 
following inspection plan as, either good or nonconforming items come off his assembly 
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line: if an item is inspected, the next item is inspected with probability p; if an item is not 
inspected, the next item will be inspected with probability 1 — p. The inspector decides to 
make p small, hoping that he inspects only a few items this way. Does his plan work? 

A model of the situation can be constructed by letting a, denote the probability that 
the nth item is inspected. 


So a, = P(nth item is inspected). 


The nth item will be subject to inspection in two mutually exclusive ways: either the n — | 
item is inspected, or it is not. Therefore, 


ad, =pa,_; +(1-—p):-(—a,_,) for n> 2. (6.3) 


Letting a, = 0 (so the first item from the production line is not inspected), Equation (6.3) 
then gives the following values for some small values of n: 


a, =0 

a,=1-p 

a3 = pl —p)+ (1 — p)p = 2p — p) 

a4 = 2p*(1 — p) + (1 — pL — 2p — p)I 
=1-3p+ 6p’ — 4p’. 


If further values are required, it is probably most sensible to proceed using a computer 
algebra system, which will calculate any number of these values. This is, in fact, one of the 
most useful features of a recursion — within reason, any of the values can be calculated from 
the problem itself. In the short term then, the question may be as valuable as the answer! 
In some cases, the question is more valuable than the answer since the answer may be very 
complex; in many cases the answer cannot be found at all, so we must be satisfied with a 
number of special cases. 

To return to the problem, how then do the values of a, behave as n increases? First, we 
note that if p = 1/2, then a, = a; = a, = 1/2, prompting us to look at (6.3) when p = 1/2. 
It is easy to see that in that case, a, = 1/2, for all values of n so that if the inspector takes 
p = 1/2. then he will inspect 1/2 of the items. 

The inspector now searches for other values of p, hoping to find some that lead to 
less work. 

A graph of a, as a function of p, found using a computer algebra system, is shown in 
Figure 6.1. 

This shows the inspector that, alas, any reasonable value for p leads to inspection about 
half the time! The graph indicates that a,, is very close to 1/2 for : <p< 5. but even 
values of p outside this range only effect the value of aj, in the early stages. Figure 6.2 
shows a graph of a,, for p = 1/4, which indicates the very rapid convergence to 1/2. 

This evidence also suggests writing the solutions for (6.3) in terms of (p — 5): Doing 
that we find: 
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0.575 
0.55 + | 
0.525 
0.5 
0.475 


a[10] 


0.45 
0.425 
0.4 


0 0.2 0.4 0.6 0.8 1 
p 
Figure 6.1 a, for the quality control inspector. 


0.8 : 


0.7 


0.5 


0.4 


Figure 6.2 a, for p = 1/4. 
ay = 2p(1 — p) = 5 - 2/ “ny 
3 =e I 5) P35 


1\3 
a, =1- 3p +6? 4p = 5 -2(p-5).. 


This strongly suggests that the general solution for (6.3) is 


1 3 1\r-! 
=f. -3) 
an = 5 P 
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Direct substitution in (6.3) will verify that this conjecture is, in fact, correct. Since 
Ip - ;| <1, we see that a, > > as n> oo. This could also have been predicted from 


(6.3), since, if a, > L, then a,_, — L and so, from (6.3), 
L=pL+(1-p)(1-D, 


whose solution is L = >. 
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Our solution used the fact that the first item was not inspected, and so it may be thought 
that the long-range behavior of a, may well be dependent on that assumption. This, how- 
ever, is not true; in an exercise, the reader will be asked to show that a, > ; asn —> oo if 
a, = 1, that is, if the first item is inspected. 

We used graphs and several specific values of a, above to conjecture the solution of the 
recursion as well as its long-term behavior. While it is often possible to find exact solutions 
for recursions, often the behavior of the solution for large values of n is of most interest; 
this behavior can frequently be predicted when exact solutions are not available. However, 
an exact solution for (6.3) can be constructed, and since the solution is typical of that of 
many kinds of recursions, it is shown now. 


Solution of the Recursion (6.3) 


While many computer algebra systems solve recursions, we give here some indication of 
the algebraic steps involved in this example as well as in others in this chapter. 
We begin with the recursion 


Gn = PAn-1 + qd —p) . ad — Ay) for n > 2 


and write it as 
ad, — (2p — l)a,_, =1—p, n= 2. 


Note first that the recursion has coefficients that are constants — they are not dependent on 
the variable n. Note also that the right-hand side of the equation is constant as well. We will 
consider here recursions having constant coefficients for the variables, but possibly having 
functions of 7 on the right-hand side. 

The solution of these equations is known to be composed of two parts: the solution of 
the homogeneous equation, in this case 


dnp — (27 — Vdy_1p = 9, 

and a particular solution, some specific solution of 
dnp ~ OP — Vay = 1 =P. 
The general solution is known to be the sum of a, , and Bs 


a, = anh + np: 


We now show how to determine these two components of the solution. 
The homogeneous equation may be written as 


anh = (2p _ lay 1 p- 


This suggests that a value of the function, a,, ,, is a constant multiple of the previous value, 
4,—-1,n- Suppose then that 


a 


nh =v" for some constant r. 


It follows that 
r'—(2p—1)r""! =0, 
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from which we conclude that r = 2p — 1. Since the equation is homogeneous, a constant 
multiple of a solution is also a solution, so 


Ay, = C(2p — 1)" where c is some constant. 


The equation r? — (2p — 1)r"~! = 0 is often called the characteristic equation. The solu- 


tions for r are called characteristics roots. 
The particular solution is some specific solution of 


an p — (2p I)a,_1p =1 —?p- 
Since the right-hand side is a constant, we try a constant, say k, for a,,. The situation when 


the right-hand side is a function of n is considerably more complex; we will encounter some 
of these equations later in this chapter. Substituting the constant k into (6.3) gives 


k-—(p-1)k=1-p whose solution is k = 7 
The complete solution for the recursion is the sum of the homogeneous solution and a 
particular solution, so 


a, = 5 +cQp- 1)", 


1 


Now a, = 0, giving c = — Writing 2p — | = 2(p — 3) we find that 


2(2p-1)° 
a = 5 -2"/ fy for n> 1 
n 5) Pp 2 ? = +> 


which is the solution previously found. 
The reader who wants to learn more about the solution of recursions is urged to read 
Grimaldi [16] or Goldberg [14]. We proceed now with some more difficult examples. 


Example 6.2.2 


We considered, in Chapter 1, the sample space when a loaded coin is flipped until two heads 
in arow occur. The reader may recall that if the event HH occurs for the first time at the nth 
toss that the number of points in the sample space for n tosses can be predicted from that of 
n— | tosses and n — 2 tosses by the Fibonacci sequence. We did not, at that time, discover 
a formula for the probability that the event will occur for the first time at the nth toss; we 
will do so now. 

Let b,, denote the probability that HH appears for the first time at the nth trial. Now 
consider a sequence of n trials for which HH appears for the first time at the nth trial. Since 
such a sequence can begin with either a tail or a head followed immediately by a tail (so 
that HH does not appear on the second trial), 


b, = qb,_, + pgb,_»,n = 3 (6.4) 


is a recursion describing the problem. 
We also take b; = 0 and by = p’. 
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This recursion gives the following values for some small values of n: 


b; = 4p" 

by = 4p" 

bs =q’p(1+p) 

bg =a p (1+ 9p) 

bo =pg(l+pqtptp). 


It is now very difficult to detect a pattern in the results (although there is one). The behavior 
of b,, can be seen in the graphs in Figures 6.3 and 6.4 where the values of a,, are shown for 
a fair coin and then a loaded coin. 


0.2 


0.15 


0.1 


b[n] 


0.05 


Figure 6.3 b, fora fair coin. 


0.2 


0.15 


0.1; 


bin] 


0.05 


0 2 4 6 8 10 12 
n 
Figure 6.4», for a coin with p = 3/4. 


We proceed then with the solution of b,, = qb,_; + pqb,_., n> 3, b, =0, b, =p’. 
Here, the equation is homogeneous and we write it as 


by = qby,-1 — pqby,_2 = 0. 
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The solution in this case is similar to that of Example 6.2.1. If we presume that b, = r”, the 
characteristic equation becomes 


r’ —qr—pq=0, 


qtv a" +4p 


giving two distinct characteristic roots, r= 4. Since the sum of solutions for a 
linear homogeneous equation must also be a solution, and since a constant multiple of a 
solution is also a solution, the general solution is 


n n 
q+ V@ +4pq q-V¢ +4pq 
n Cy ————SS + Co ————$———<——— ‘ 


2 2 


The constants c, and cy can be determined from the boundary conditions b, = 0, by =p’, 
giving 
Bee 2p q+ Ve +4pq\" 
" qV@P + 4pq+ & + 4pq . 
2p" ~ /e +4pq\" 
- ( a | es 
ave + 4pq— 4 — 4pq 


Computer algebra systems often solve recursions. Such a system solves the recursion 
(6.4) in the case where p = q = 1/2 as 


(2) S62) 


This result can also be found by substituting p = g = 1/2 in (6.5). Other values for p and g 
make the solution much more complex. 


Mean and Variance 


The recursion b,, = gb,_, + pqb,_2,n = 3, can be used to determine the mean and variance 
of N, the number of tosses necessary to achieve two heads in a row. We use a technique 
that is very similar to the one we used in Chapter 2 to find means and variances of random 
variables. Multiplying the recursion through by n and summing from 3 to infinity gives 


ive) co oO 


> nb, aa oy qnb,_| a > pqnb,_». 
n=3 n=3 


n=3 


This becomes 


foe) foe) foe) 


Y! nb, — 2by = GY [n= 1) + 11,1 + pg YU = 2) + 21,9. 
n=3 


n=2 n=3 


Expanding and simplifying gives 


E[N] — 2b = qE[N] + q+ pgE[N] + 2pq, 
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from which it follows that 
l+p 
Ifp = 1/2, this gives an average waiting time of six tosses to achieve two heads in a row. The 


result differs a bit from + , aresult that might be anticipated from the geometric distribution, 
but we note that the variable is not geometric here. 


The variance is calculated in much the same way. We begin with 


Yi n(n = Db, = [n= Yn = 2) + An = Vb-1 
n=3 n=3 


foe) 


+ pq y [(n - 2)(n = 3) + 41 = 2) + 21,0. 


n=3 
Expanding and simplifying gives 


E[N(N — 1)] — 2b = gE[N(N — 1)] + 2gE[N] + pgEIN(N — 1)] 
+ 4pqE|[N] + 2pq. 


—i9 23 
It follows then that Var(N) = a If p = 1/2, Var(N) = 22 tosses. 
We turn now to other examples. 


Example 6.2.3 


Consider again a sequence of Bernoulli trials with p the probability of success at a single 
trial. In a series of n trials, what is the probability that the sequence SF’ never appears? 


Here a few points in the sample space will assist in seeing a recursion for the 
probability: 


n=2 SS 
FS 
FF 


n=3 SSS 
FSS 
FFS 
FFF 


n=4  SSSS 
FSSS 
FFSS 
FFFS 
FFFF. 
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It is now evident that a sequence in which SF never appears can arise in one of two mutually 
exclusive ways: either the sequence is all F’s or, when an S appears, it must then be followed 
by all S’s. The latter sequences end in S and are preceded by a sequence of n — | trials in 
which no sequence SF appears. So, if u,, denotes the probability a sequence of n trials never 
contains the sequence SF, then 


u, = q" +pu,_}, N=2, uy =1 or 


Uu,—Pu, 1 =", n>2, uw =1. (6.6) 


A few values of u,, are as follows: 


These can be rewritten as 


2 2 
GP. 
uy = if p # q, 
q—-P 
3 3 
—p. 
uy =4 if p # q, 
q—-P 
4 4 
q —P . 
U3 = if p#q. 
q—-P 
gitl_pntl 


This leads to the conjecture that u,, = , n=l, p#q. The validity of this can be 


q- 
seen by substituting uw, into the original recursion (6.6). Substituting in the right-hand side 
of (6.6), we find, provided that p # gq, 


nN yn 
is ores : and this simplifies to 
qt! _ pntl 
, verifying the solution. 
q-P 


The solution of the recursion (6.6) is also easy to construct directly. The characteristic 
equation is 

fp = 0, 
giving the characteristic root r = p. Therefore, the homogeneous solution is 


=, pn? 
Unpn =C Pp. 


Now we seek a particular solution. 
Since the right-hand side of u,, — pu,_; = qg" is q", suppose that we try 
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Then, substituting in the recursion, we have 


An 


k-q'-k-p-q"'=q 


and we find that 
k= = provided that g # p. 
q-P 


n+l 


So u,, = a q # p, and the general solution is 
n+l 
u, = cp" + , provided that q # p. 
q—P 


By imposing the condition uv; = | and simplifying, we find, as before, that 


n+l _ pn+l 
u,=i—?  n>1, q@¥p. (6.7) 
q-P 


We now investigate the case when p = q In that case, (6.6) becomes 


1 1 n 
ly — 5 Mn = (5) (6.8) 


Now the homogeneous solution is 


; F 1\" F 
so a particular solution u,,, = (5) , a natural choice for u,, ,, 


substituted into the left-hand side of (6.8). We try the function 


u = k “ne (5) 
np a 
for the particular solution. 
Then the left-hand side of (6.8) becomes 


bn (i) dkeno- (2) 


This simplifies to k - (5) ; So k = 1, and we have found then the general solution: 


P =c-(Z) +n-(3), 
a 2 2) ° 


The boundary condition u, = | gives c = 1. The general solution in this case is 


will only produce 0 when 


1 


1 n 
=(n+1 (5) 
u, = (n+ 1) 5 
This solution can also be found by using L’ Hospital’s rule in (6.7). In this case, we have 


n+1 je n+1 _ »ntl 
ey 
prl/2 q-Pp p7i/2 1—2p 


n+l __ 
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= yy ee ae 
p1/2 —2 


=(n+0(5). 


EXERCISES 6.2 


1 
2 
2. Solve the recursion L, = L,_, + L,2 where Ly = 2 and L, = 1. The result determines 
the Lucas numbers. 
Exercises 3—5 refer to a sequence of Bernoulli trials, where p is the probability of 
an event andp+q=l. 


1. In Example 6.2.1, show that a,— > > as n— > co if a, = 1 (the first item is inspected). 


3. Describe a problem for which the recursion a, = ga,_;, n> 1, where a, = 1—q is 
appropriate. Then solve the recursion verifying that it does in fact describe the problem. 
4. Let a, denote the probability that a sequence of Bernoulli trials with probability of 
success p has an odd number of successes. 
(a) Show that a, = p(1 — a,_;) + ga,_,, for n = 1 if ag = 0. [Hint: Condition on the 
result of the first toss. ] 
(b) Solve the recursion in part (a). 


5. (a) Find a recursion for the probability, a, that at least two successes occur in n 


Bernoulli trials. 
(b) In part (a), let p = 1/2. Show that a, = 1 —- 
Fibonacci sequence 1, 1, 2, 3,5, 8, ... 


n? 


fs , where f,, is the nth term in the 


6. Two machines in a manufacturing plant produce items that are either good (g) or unac- 
ceptable (uw). Machine | has produced g good and u unacceptable items, while the 
situation with machine 2 is exactly the reverse; it has produced u good items and g 
unacceptable items. An inspector is required to sample the production of the machines, 
and, to achieve a random order of items from each of the machines, the following plan 
is devised: 

1. The first item is drawn from the output of machine 1. 

2. Drawn items are returned to the output of the machine from which they were drawn. 
3. If a sampled item is good, the next item is drawn from the first machine. If the 
sampled item is unacceptable, the next item is drawn from the second machine. 

What is the probability that the nth sampled item is good? 


7. A basketball player makes a series of free throw attempts. If he makes a shot, he makes 
the next one also with probability p,. However, if he misses a shot, he makes the next 
one with probability p, where p,; # p>. If he makes the first shot, what is the probability 
he makes the nth shot? 


8. A coin loaded to come up heads with probability p is tossed until the sequence TH 
occurs for the first time. Let a, denote the probability that the sequence TH occurs for 
the first time at the nth toss. 

(a) Show that a, = pa,_, + pq"!, ifn > 3 where a, = 0 and a, = pq. 
(b) Show that the average waiting time for the first occurrence of the sequence TH is 
1/pq. 
a—1 


(c) If p= q = 1/2, show that a, = ett = 9. 
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9. A party game starts with the host saying “yes” to the first person. This message is 
passed to the other guests in this way: if a person hears “yes,” he passes that on with 
probability 3/4; however, if a person hears “no”, that is passed on with probability 2/3. 


(a) What is the probability the nth person hears “yes”? 


(b) Suppose we want the probability that the seventh person hears “yes” to be about 
1/2. What should be the probability that a “no” response is correctly passed on? 


10. A coin has a 1 marked on one side and a 2 on the other side. It is tossed repeatedly, and 
the cumulative sum that has occurred is recorded. Let p,, denote the probability that the 
sum is n at some time during the course of the game. 


(a) Show a sample space for several possible sums. Explain why the number of points 
in the sample space follows the Fibonacci sequence. 


(b) Find an expression for p,, assuming the coin is fair. 
(c) Show that p, > 2/3 asn > oo. 


(d) How should the coin be loaded so as to make p,7, the probability that the sum 
becomes 17 at some time, as large as possible? 


11. Find the mean for the waiting time for the pattern HHH in tossing a coin loaded to 
come up heads with probability p. 


6.3 RANDOM WALK AND RUIN 


We now show an interesting application of recursions and their solutions, namely, that of 
random walk problems. 

The theory of random walk problems is applicable to problems in many fields. We 
begin with a gambling situation to illustrate one approach to the problem. (Another avenue 
of approach to the problem will be shown in the sections on Markov chains later in this 
chapter.) 

Suppose a gambler is playing a game against an opponent where, at each trial, the 
gambler wins $1 from his opponent or loses $1 to his opponent. If in the course of play, 
the gambler loses all his money, then he is ruined; on the other hand, if he wins all his 
opponent’s money, he wins the game. We want to find then the probability, a,, that the 
player (the gambler) wins the game with an initial fortune of $g while his opponent (the 
house) initially has $h. 

While the probability of winning at any particular trial is of obvious importance, we 
will see that, in addition to this, the probability the gambler wins the game is highly depen- 
dent on the amount of money with which he starts, as well as the amount of money the 
house has. 

The game can be won with a fortune of $7 under two mutually exclusive circumstances: 
the gambler wins the next trial (say with probability p) and goes on to win the game with 
a fortune of $(7 + 1), or he loses the next trial with probability 1 — p = q and subsequently 
wins the game with a fortune of $(n — 1). This leads to the recursion 


an = Pans + Gay_}> ay = 0, Agih =1. 


The characteristic equation is pr? — r + g = 0, which has roots of | and q/p. 
Assuming that p # gq, the solution is then of the form 


a, =A+B:- (2) ‘ 
p 
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Using the fact that ay = 0 gives0 =A+B,soa, =A [! = (4) | 
Using the fact that a,,,, = 1 produces the solution 


i (4) 
Dp 


a= niga? CFP 
oy, 
P 


So, in particular, 


Formula (6.9) gives some interesting numerical results. Suppose that p = 0.49 so that the 
game is slightly unfavorable to the gambler, and that the gambler initially has g = $10. The 
following table shows the probability the gambler wins the game for various fortunes ($/) 
for his opponent: 


$h Probability of winning 
$10 0.401300 
$14 0.305146 
$18 0.238174 
$22 0.189394 
$26 0.152694 
$30 0.124404. 


One conclusion that can be drawn from the table is that the probability the gambler wins 
drops rapidly as his opponent’s fortune increases. Figure 6.5 shows graphs of the probability 
of winning with $g as a function of the opponent’s fortune, $h. 

Although the game is slightly (and only slightly) adverse for the gambler, the gambler 
still has, under some combinations of fortunes, a remarkably large probability of winning 
the game. If the opponent’s fortune increases, however, that probability becomes very small 


O4F 
0.35 + 

0.3 | 
0.25 + 


Pwin 


0.2 | 
0.15 - 
0.1 | 


0.05 | 
20 25 30 35 40 
N 


Figure 6.5 Probability of winning the gamblers’ ruin with an initial fortune of $10 against an opponent with 
an initial fortune of $N with p = 0.49. 
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very quickly. The following table shows, for p = 0.49, the probability that a gambler with 
an initial fortune of $10 wins the game over an opponent with an initial fortune of $h: 


$h Probability of winning 
$90 0.00917265 
$98 0.00662658 
$106 0.00479399 
$114 0.00347174 
$122 0.0025 1603 
$130 0.00182438. 


Now the gambler has little chance of winning the game, but his best chance occurs when 
his opponent has the least money or, equivalently, when the ratio of the gambler’s fortune to 
that of his opponent is as large as possible. It is interesting to examine the surface generated 
by various initial fortunes and values of p. This surface, for h = $30, is shown in Figure 6.6. 
The contours of this surface appear in Figure 6.7. 

The chance of winning the game then is very slim, but that is because the gambler 
must exhaust his opponent’s fortune and he has little hope of doing this. While the game 
is slightly unfavorable to the gambler, the real problem lies in the fact that the ratio of the 
gambler’s fortune to that of his opponent is small. When this ratio increases, so does the 
gambler’s chance of winning. These observations suggest two courses of action: 


1. the gambler revises his plans and quits playing the game when he has achieved a 
certain fortune (not necessarily that of his opponent) or 


2. the gambler bets larger amounts on each play of the game. 


Either strategy will increase the player’s chance of meeting his goals. 
For example, given the game where p = 0.49 and an initial fortune of $10, the gambler’s 
chance of doubling his money to $20 is 0.4013, regardless of the fortune of his opponent. 


Probability 


Figure 6.6 Probability of winning the gambler’s ruin. The player has an initial fortune of $g. p is the probability 
an individual game is won. The opponent initially has $30. 
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Figure 6.7 A contour plot of the gambler’s ruin problem. 


The probability that a player with $50 will reach $60 in a game where he has probability 
0.49 of winning on any play is about 0.64. Clearly, the lesson here is that modest goals have a 
fair chance of being achieved, but the gambler must stop playing when he has reached them. 

The following table shows the probability of winning a game against an opponent with 
initial fortune of $100 when the bet on each play is $25. The player’s initial fortune is $g. 


Again, p = 0.49. 
$g Probability of winning 
$25 0.184326 
$50 0.307047 
$75 0.394564 


Betting as much as you can in gambling games unfavorable to the gambler can then be 
combined with alternative strategies as the gambler’s fortune increases (if, in fact, it does!) 
to increase the gambler’s chance of winning the game. 

We turn now to the expected duration of the game. 


Expected Duration of the Game 


Let E,, denote the expected duration of the game if the gambler has $7. Since winning or 
losing the next trial increases E,, by 1, 


E, = pE, 44 + gE, + 1, Eo = 0, Eoth = 0. 


This recursion is very similar to that for a,, differing only in the boundary conditions and 
in the appearance of the constant 1. The characteristic roots are 1 and g/p again, and so the 
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2000 


1500 


1000 + 


No. of games 


500 


0 20 40 60 80 100 
g 

Figure 6.8 Expected duration of the gambler’s ruin when the gambler initially has $g and the house has 

$(100 — g). 


solution is of the form 


g,=4+8(4) +C-n, q#p. 
P 


Here, the term C - n represents the particular solution since a constant is a solution to the 
homogeneous equation. 
The constant C must satisfy the equation Cn — pC(n+ 1) —qC(n— 1)=1, so C= 


rt The boundary conditions are then imposed giving the result 


i- (4) 
n _gth P 


E, = q-P 4-P,_ («)" q # P. 
p 
In particular, since the gambler starts with $g, 
-@) 
E,=—- a eee q# DP. 


G= GaP Cn 

P 
Some particular results from this formula may be of interest. Assume again that the game is 
slightly unfavorable to the gambler, so that p = 0.49 and assume that VN = $100. The game 
then has expected length 454 trials if the gambler starts with $10. 

With how much money should the gambler start in order to maximize the expected 
length of the game? If the gambler and the house have a combined fortune of $100, a com- 
puter algebra system shows that the maximum expected length of the series is about 2088 
games, occurring when g = $65 (and h = $35). A graph of the expected duration of the 
game if g + h = $100 is shown in Figure 6.8. 


EXERCISES 6.3 


1. Find the solution to the random walk and ruin problem if p = 1/2. 
2. Find the expected duration of the gambler’s ruin game if p = 1/2. 


3. Show a graph of the probability of winning the gambler’s ruin game if the game is 
favorable to the gambler with p = 0.51. 
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4. Show a graph of the expected duration of the gambler’s ruin game when the game is 
favorable to the gambler, say p = 0.51. 


5. Show that if p = 0.49 and h = $116, the probability the gambler will be ruined is at least 
0.99, regardless of the amount of money the gambler has. 


6.4 WAITING TIMES FOR PATTERNS IN BERNOULLI 
TRIALS 


In Chapter 1, we considered a game in which a fair coin is thrown until one of two patterns 
occurs: TH or HH. We concluded that the game is unfair since TH occurs before HH with 
probability 3/4. We also considered other problems in Chapter 2, which we call waiting 
time problems such as waiting for a single Bernoulli success. In Section 6.2, we considered 
waiting for two successes in a row. We now want to consider more general waiting time 
problems, seeking probabilities as well as average waiting times. 

Some of these problems may well reveal solutions that are counterintuitive; we have 
noted that the average waiting time for HH with a fair coin is 6 tosses; the average waiting 
time for TH is 4 tosses, results that can easily be verified by simulation. This is a very 
surprising result for a fair coin; many people suspect that the average waiting time for these 
patterns for a fair coin should be the same, but they actually differ. 

We now show how to determine probability generating functions from recursions. 
Since the direct construction of recursions for first-time occurrences involving complex 
patterns is difficult, we show how to arrive at the recursions by first creating a generating 
function for occurrence times. Then we use this generating function to find a recursion for 
first-occurrence times. The general technique will be illustrated by an example. 


Example 6.4.1 


A fair coin is thrown until the pattern THT occurs for the first time. On average, how many 
throws will this take? 

First, let us look at the sample space. Let n be the number of tosses necessary to observe 
the pattern THT for the first time. 


n=3 THT 

n=4 TTHT 
HATHT 

n=5 TTTHT 
HHTHT 
HTTHT 


n=6 TIITHT 
HHHTHT 
HHTTHT 
THHTHT 
HTTTHT 
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For n = 7, there are 9 sequences, while there are 16 sequences if n = 10. The pattern in the 
sequence 1, 2, 3,5, 9, 16, ... may prove difficult for the reader to discover; we will find 
it later in this section. In any event, the task of identifying points for n = 28, say, is very 
difficult to say the least (there are 1,221,537 points to enumerate!). 

Before solving the problem, let us first consider a definition of when the pattern THT 
occurs. In a sequence of throws, we examine the sequence from the beginning and note 
when we see the pattern THT; we say the pattern occurs; the examination of the sequence 
then begins again starting with the next throw. For example, in the sequence 


HHTHTHTHHTHT, 


the pattern THT occurs on the 5th and 12th throws; it does not occur on the 7th throw. This 
agreement, which might well strike the reader as strange, is necessary for some simple 
results that follow. 

Let us suppose then that the pattern THT occurs on the nth throw. (This is not neces- 
sarily the first occurrence of the pattern.) Let u,, denote the probability that the pattern THT 
occurs at the nth throw; consider a sequence in which THT occurs at the nth throw. THT 
then either occurs at the nth trial or it occurs at the n — 2 trial, followed by HT. Since any 
sequence ending in THT has probability pq’, it follows that 


Un + pq Un-2 = pa: n 2 3. (6.10) 


We take uy = 1 (again so that a subsequent result is simple) and of course, wu, = 0 and 
uy = 0. 
Results from the recursion are interesting. We find, for example, that 


ms =pr(l—pat pg -P a +Pd — pa +P°Q) 
_ PPL + Pq)" 
l+pq 
In general, an examination of special values using a computer algebra system leads to the 
conjecture that 
pg - (-pa)""] 


> abe 1, 2,3,..2. 
1+ pq 


Ugn = Ugn-| = 


This in fact will satisfy (6.10) and is its general solution. 
It is easy to see from this that 


since pq < 1 and so lim (pq)""! = 0. 


This result can also easily be found from (6.10) since, if u,, > L, say then u,_. > Las 
well. So in that case, we have 
L+p-q:L=p-¢ 


whose solution is 
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Generating Functions 


Now we seek a generating function for the sequence of u/s. 


Let U(s) = Yu, 8” be the generating function for the u,,’s. Multiplying both sides 
of (6.10) by s” and summing, we have 


[or] os) io) 

n n= 2 n 
> a So+ Ye 2P9 S = pd ys : 
n=3 n=3 n=3 


Expanding and simplifying gives 


a3 
Pq's 
U(s) - Uys” —UyS— Ug + s*pq{U(s) — Up] = pare 


from which it follows that 


pas 


a aad (1 — s)(1 + pqs?) 


Now let F(s) denote a generating function for f,,, the probability that THT occurs for the 
first time at the nth trial. It can be shown (see Feller [7]) that 


1 
F(s) = 1- —. 
(8) U6) 
In this case, we have 
pas: 


F(s) = ———————_... 
) 1 — s+ pqs? — p2qs3 


A power series expansion of F(s), found using a computer algebra system, gives the 
following: 


fy = Pr 

fa = PP 

fs = pg(1 — pa) 

fs = PE (1 — 2pq + 4p"). 
Now we have a generating function whose coefficients give the probabilities of 
first-occurrence times. One could continue to find probabilities from this generating 


function using a computer algebra system, but we show now how to construct a recurrence 
from the generating function for F(s). 


3 
PY Ss” 
Let F(s) = Sears =H=fotfsthst+hAser+fyst+--- 


It follows that 


pays =(1—-s+pqs’ — pas fo + fis ths + hs’ +fys? +++). 
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By equating coefficients, we have 


Jo = 9 
J, =0 
fh =0 
f= Pr 


and for n 2 4, ‘f =Jn-1 — POn-2 +p? n-3° 


(6.11) 


We have succeeded in finding a recursion for first-occurrence times. The reader with access 
to a computer algebra system will discover some interesting patterns in the formulas for the 
J,,- Numerical results are also easy to find. For the case p = q, fy9,the probability that THT 


will be seen for the first time at the 20th trial is only 0.01295. 


Finally, we find the pattern in the number of points in the sample space for first-time 
occurrences. Let w,, denote the number of ways THT can occur for the first time in v trials. 


Since, when p = q = 1/2, 


then 


Wi = 2Wy-1 


Wrg tT Wy3, NS 6, 


where w3 = | wy = 2, and ws = 3, using (6.11). 


Average Waiting Times 


The recurrence (6.11) can be used to find the mean waiting time for the first occurrence of 


the pattern THT. By multiplying through by n and summing, we find that 


py nN In = iG 7 1) + 1] n-1 — pq YiU(n ~ 2) + 2Un-2 
n=4 n=4 n=4 


+ pg y(n - 3) + 3p 
n=4 


This can be simplified as 
E(N) — 3pq” = E(N) + 1 — pqlE(N) + 2] + p°qlE(N) + 31, 


from which we find that 


ie 
E(N) = —24. 
pq 


For a fair coin, the average waiting time for THT to occur is 10 trials. 
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Means and Variances by Generating Functions 


The technique used earlier to find the average waiting time can also be used to find the 
variance of the waiting times. This was illustrated in Example 6.2.2. Now, however, we 
have the probability generating function for first-occurrence times, so it can be used to 
determine means and variances. 

A computer algebra system will show that F’(1) = _ and that 


FP") = 2(q + 4p’ — 2p? — 2p* + p°) 
= peg a 


and since 
o =F") + F'() - LPP, 


we find that ar ae 
gia Lt 2pq-Spg +P a - Pe 


pag 


With a fair coin then the average number of throws to see THT is 10 throws with variance 58. 


EXERCISES 6.4 


1. In Bernoulli trials with p the probability of success at any trial, consider the event SSS. 
Let u,, denote the probability that SSS occurs at the nth trial. 
(a) Show that u,, + pu,_; + PU,» = p®,n = 4, and establish the boundary conditions. 


(b) Find the generating function, U(s), from the recursion in part (a). Use U(s) to 
determine the probability that SSS occurs at the 20th trial if p = 1/3. 


(c) Find F(s), the generating function for first-occurrence times of the pattern SSS. 
Again, if p = 1/3, find the probability that SSS occurs for the first time at the 20th 
trial. 


(d) Establish that w,,, the number of ways the pattern SSS can occur in n trials for the 
first time, is given by 
Wh = Wh-1 at Wn-2 ao Wn-3> 


for an appropriate range of values of n. This would apparently establish a 
“super-Fibonacci” sequence. 


2. In waiting for the first occurrence of the pattern HTH in tossing a fair coin, we wish to 
create a fair game. We would like the probability the event occurs for the first time in 
n or fewer trials to be 1/2. Find n. 


3. Find the variance of N, the waiting time for the first occurrence of THT, with a 
loaded coin. 


4. Suppose we wait for the pattern TTHTH in Bernoulli trials. 
(a) Find a recursion for the probability of the occurrence of the pattern at the nth trial. 


(b) Find a generating function for occurrence times and, from that, a recurrence for 
first-occurrence times. 


(c) Find the mean and variance of first-occurrence times of the pattern. 
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6.5 MARKOV CHAINS 


In practical discrete probability situations, Bernoulli trials probably occur with the greatest 
frequency. Bernoulli trials are assumed to be independent of constant probability of success 
from trial to trial; these assumptions lead to the binomial random variable. Can the binomial 
be generalized in any way? One way to do this is to relax the assumption of independence, 
so that the outcome of a particular trial is dependent on the outcomes of the previous trials. 
The simplest situation assumes that the outcome of a particular trial is dependent only on 
the outcome of the immediately preceding trial. Such trials form what are called, in honor 
of the Russian mathematician, a Markov chain. 

While relaxing the assumption of independence and, in addition, making only the out- 
come of the previous trial an influence on the next trial, seem simple enough, they lead 
to a very complex, although beautiful, theory. We will consider only some of the simpler 
elements of that theory here and will frequently make statements without proof, although 
we will make them plausible. 


Example 6.5.1 


A gambler plays on one of four slot machines each of which has probability 1/10 of paying 
off with some reward. If the player wins on a particular machine, she continues to play on 
that machine; however, if she loses on any machine, she chooses one of the other machines 
with equal probability. 

For convenience, number the machines 1, 2, 3, and 4. Here, we are interested in the 
machine being played at the moment, and that is a function of whether or not the player 
won on the last play. Now let p;; denote the probability that machine j is played immedi- 
ately following playing machine i. We call this a transition probability since it gives the 
probability of going from machine 7 to machine /j. 


Now we find some of these transition probabilities. For example, p). = — . - - - 
since the player must lose with machine | and then switch to machine 2 with probability 


1/3. 

Also p33 = 1/10 since the player must win with machine 3 and then stay with that 
machine, while py, = 3/10, the calculation being exactly the same as that for p>. 

The remaining transition probabilities are equally easy to calculate in this case. It 
is most convenient to display these transition probabilities as a matrix, 7, whose entries 
are pj: 
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This transition matrix is called stochastic since the row sums are each |. Of course that has 
to be true since the player either stays with the machine currently being played or moves 
to some other machine for the next play. Actually, T is doubly stochastic since its column 
sums are | as well, but such matrices will not be of great interest to us. Matrices describing 
Markov chains, however, must be stochastic. 

The course of the play is of interest, so we might ask, “What is the probability the 
player moves from machine 2 to machine 4 after two plays?” We denote this probability by 
Poa 

Since the transition from machine 2 to machine 4 involves two plays, the player either 
goes to machine 2 from machine | or machine 2 or machine 3 or machine 4, and on the 
second play moves to machine 4, we see that 


(2) 
Px = PrP 14. + P22P24 + P23P34 + Pr4Pa4 


3.3 1 3 3 3 3 1 6 


= 1010’ 1010’ 1010 1010 25° 


But this product is simply the dot product of the second row of T with the fourth column of 
T — an entry of the matrix product of T with itself. Hence, T? = ip) where T? denotes the 
usual matrix product of T with itself. The reader should check that the remaining entries of 
T* give the proper two-step transition probabilities. We have 


ae ee 

25 2 

ces ss 
2125 BB 2 

7 

2 2 25 25 

Oo ey es 

25 25 25 25 


The entries of T” then represent transition probabilities in n steps. A computer algebra 
system is handy in finding these powers. In this case, we find that 


157 156 156 156 
625 625 625 625 


156 157 156 156 
625 625 625 625 | 


156 156 157 156 
625 625 625 625 


156 156 156 157 
625 625 625 625 
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Each of the entries in T+ is now very close to 1/4, and we conjecture that 


3 

J 
Ale BIE AIE AIH 
Ale BIR Ale AIO 
Ale BIE BIE BIO 
Ale BIE BIE BIO 


This shows, provided that our conjecture is correct, that the player will play each of the 
machines about 1/4 of the time as time goes on. 
1 1 


The vector creer 


is called a fixed vector for the matrix T since 


1 3 3 3 
10 10 10 10 
a dt sf OS 
1111\/10 10 10 tof _/1 111 
fore) oe fs aaa) 
10 10 10 10 
3 3 3 1 
10 10 10 + 10 


We say that a nonzero vector, w, is a fixed vector for the matrix T if 
wl =w. 


Many (but not all) transition matrices have fixed vectors, and when they do have fixed 
vectors the components are rarely equal, unlike the case with the matrix T. The fixed vector, 
if there is one, shows the steady state of the process under consideration. 

Is the constant vector with each entry 1/4 a function of the probability, 1/10, of staying 
with a winning machine? To answer this, suppose p is the probability the player stays with 
a winning machine and switches then with probability h to each of the other machines. 
The transition matrix is then 


3 3 3 
l-p l1-p l1-p 
3 P 3 3 
P= 
l-p 1-p l-p 
3 3 P 3 
l-p 1-p 1-p 
3 3 3 P 
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We find that the vector 
so the player will use eac 
for p! 

It is common to refer to the possible positions in a Markov chain as states. In this 
example, the machines are the states and the transition probabilities are probabilities of 
moving from state to state. The states in Markov chains often represent the outcomes of the 
experiment under consideration. 


is a fixed point for the matrix P for any 0 < p< 1, 


AIS 


> 


0 


AIS 
case 


u 
iar. 
th 


—_~ 
SD ape 
<5 


achines about 1/4 of the time, regardless of the value 


Example 6.5.2 


Consider the transition matrix with three states 


Dlr BlwW Ble 
Die ole Ble 


WIN CO] Nl 


Calculation will show that powers of R approach the matrix 


6 8 3 
i i 
6 8 3 
17 17 «#17 
6 8 3 
17 17 «#17 


It will also be found that the solution of 


iid 
2 4 4 
(abo}i 3 ll=@ob,0 
8 4 8 
211 
3 6 6 
with the restriction that a+ b+ c= 1 has the solution (3. -, — ). so R has a fixed vec- 


tor also. 

The examples above illustrate the remarkable fact that the powers of some matrices 
approach a matrix with equal rows. We can be more specific now about the conditions 
under which this happens. 


Definition We call a matrix T regular if, for some n, the entries of 7” are all positive (no 
zeroes are allowed). 


In the earlier examples, T and R are regular. We now state a theorem without proof. 
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Theorem: If 7 is a regular transition matrix, then the powers of 7, T”,n > 1, each 
approach a matrix all of whose rows are the same probability vector w. 


Readers interested in the proof of this are referred to Isaacson and Madsen [21] or 
Kemeny et al. [24]. 


Example 6.5.3 
BS 
The matrix K=|0 1 OJ is not regular, since the center row remains fixed, regardless 
ot 
of the power of K being considered. But 
1 1 
= 0. 2 
2 2 
(010)-Jo 1 o0f/=@1 9) 
1 3 
0. & = 
4 4 


showing that the vector (0, 1, 0) is, however, a fixed vector for K. 


Example 6.5.4 
1 O 0 

The matrix A=|a 0O 1l-—a]for0<a< 1 is not regular since A” = A, for n > 2, and 
0 0 1 


each row of A is a fixed vector. This shows that a nonregular matrix can have more than one 
fixed vector. If, however, a regular matrix has a unique fixed vector, then each row of the 
matrix 7” approaches that fixed vector. 

To see this, suppose that the vector w = (w,, W2, w3) is a fixed probability vector for 
some 3 by 3 matrix T. Then wT = w so wT? = (WT)T = wT = wand so on. Butif T” > K, 
where K is a matrix with identical fixed rows, say 


abe 
K=la b c}, 
abe 
then wK = w, or 
abe 
(W1.W2,W3)}a b cl = (Wy, Wp, W3)- 
abe 


So wija+w,a+w3a=w, and since w,) +w,+w;=1, w, =a. A similar argument 
shows that w, = b and w3 = c, establishing the result. It is easy to reproduce the argument 
for any size matrix T. 
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What influence does the initial position in the chain have? We might conjecture 
that after a large number of transitions, the initial state has little, if any, influence on the 
long-term result. To see that this is indeed the case, suppose that there is a probability 


vector, P® = Dal) whose components give the probability of the process starting 


in any of the three states and where + py = |. (We again argue, without any loss of 
generality, for a 3 by 3 matrix T.) 
P°T is a vector, say P', whose components give the probability that the process is in 
any of the three states after one step. 
Then P!T = P°T . T = P°T? and so on. But if T” > K, and if the fixed vector for the 
matrix K is say (k, ky k3), then 
PK = (kk, ks) 


since the components of P° add up to 1, showing the probability that the process is in any 
of the given states after a large number of transitions is independent of P°. 
Now we discuss a number of Markov chains and some of their properties. 


Example 6.5.5 


The random walk and ruin problem of Section 6.2 can be considered to be a Markov 
chain, the states representing the fortunes of the player. For simplicity, suppose that $1 
is exchanged at each play, that the player has probability p of winning each game (and 
losing each game with probability 1 — p), and that the player’s fortune is $n, while the 
opponent begins with $4; the boundary conditions are P) = 0 and P, = 1. There are five 
states representing fortunes of $0, $1, $2, $3, and $4. The transition matrix is 


012 3 4 
Of1 0 0 0 0 
ll¢ 0 p 0 O 
T=2]0 gq 0 p Of, 
310 0 ¢g O p 
4{0 0 0 0 1 


reflecting the facts that the game is over when the states n = 0 or n = 4 are reached. These 
states are called absorbing states since, once entered, they are impossible to leave. We call a 
Markov chain absorbing if it has at least one absorbing state and if it is possible to move to 
some absorbing state from any nonabsorbing state in a finite number of moves. The matrix 
T describes an absorbing Markov chain. 

It will be useful to reorder the states in an absorbing chain so that the absorbing states 
come first and the nonabsorbing states follow. If we do this for the matrix T, we find that 


100 0 0 
0 10 0 0 
T=|q 0 0 p Of}. 
0 0g 0 p 
0 p 0 q 0 
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We can also write the matrix in block form as 


I O 

R Q 
where J is an identity matrix, O is a matrix each of whose entries is 0, and Q is a matrix 
whose entries are the transition probabilities from one nonabsorbing state to another. 
Any transition matrix with absorbing states can be written in the above-mentioned block 


form. 
In addition, matrix multiplication shows that 


n_{l oO 
= e a: 


The entries of Q” then give the probabilities of going from one nonabsorbing state to another 
in n steps. 

It is a fact that if a chain has absorbing states, then eventually one of the absorbing 
states will be reached. The central reason for this is that any path avoiding the absorbing 
states has a probability that tends to 0 as the number of steps in the path increases. 

The possible paths taken in a Markov chain are of some interest and one might con- 
sider, on average, how many times a nonabsorbing state is reached. Consider a particular 
nonabsorbing state, say state /. 

The entries of Q give the probabilities of reaching j from any other nonabsorbing state, 
say i, in one step. The entries of Q? give the probabilities of reaching state j from state i in 
two steps, and, in general, the entries of Q” give the probabilities of reaching state 7 from 
state i in n steps. Now define a sequence of indicator random variables: 


1 if the chain is in state j in k steps 
XxX; => : 
0 otherwise 


Then X, the total number of times the process is in state /, is 
x=) x, 
k 


and so the expected value of X, the expected number of times the chain is in state j, is 
EX) =1+ D1, 
L 


where qi; is the (i,j) entry in the matrix Q” and where J; = | or 0, depending on whether or 
not the chain starts in state /. 

This shows that E(X) = 1+ 0+ 0? + 0? +--- where J is the n by n identity matrix. 

It can be shown in this circumstance that 7 — Q)-! =1+Q+0?+03+---, and so 
the entries of (J — Q)~! give the expected number of times the process is in state j, given 
that it starts in state i. 

The matrix (J— Q)~! is called the fundamental matrix for the absorbing Markov 
chain. 
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Example 6.5.6 


Consider the transition matrix 


1 0 0 0 0 
0 t 0 0 0 
1 3 
=|- 0 0 = 0O 
‘ i 1 : 3 
- 0 c= 
. ; 4 i 4 
0 - 0 - O 
4 4 
representing a random walk. The matrix @Q here is 
S) 3 
0 - O 1 -= 0 
4 4 
O= 1 0 3 and [-Q= = 1 2}, 
4 4 4 4 
1 1 
0 - O 0 -- 1 
4 
so 
136 9 
10 5 10 
aq-gy'=|2 8 6 
5 5 5 
i, 2-G 
10 5 10 


If the chain starts in state 1, it spends on average 13/10 times in state 1, 6/5 times in state 
2, and 13/10 times in state 3. This means that, starting in state 1, the total number of times 
in various states is & Por ig =e So the average number of turns before absorption 
must be 19/5 if the process begins in state 1. 

Similar calculations can be made for the other beginning states. If we let V be acolumn 
vector each of whose entries is 1, then (J — Q)~!V represents the average number of times 
the process is in each state before being absorbed. Here, 


eS la ee 
10 5 10 5 
d=or'va|2 8 214|=|41 
5.5 5 5 
2 Ella 2 
10 5 10 5 


We continue with further examples of Markov chains. 
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Example 6.5.7 


In Example 6.5.5, we considered a game in which $1 was exchanged at each play and where 
the game ended if either player was ruined. Now consider players who prefer that the game 
never end. They agree that, if either player is ruined, the other player gives the ruined player 
$1 so that the game can continue. This creates a random walk with reflecting barriers. 

Here is an example of such a random walk. p is the probability a player wins at any 
play, g = | — p and there are four possible states. The transition matrix is 


er OV O 
SoS So: SS 


Go OR CO 
ok OF 


It is probably not surprising to learn that M is not a regular transition matrix. Powers of M 
do, however, approach a matrix having, in this case, two sets of identical rows. We find, for 
example, that if p = 2/3, then 


S 

J 
ATI Oo NIR Oo 
ONIwW OO NIW 
AUN OAD SO 
oN ONS 


Example 6.5.8 


A private grade school offers instruction in grades K, 1,2, and 3. At the end of each aca- 
demic year, a student can be promoted (with probability p), asked to repeat the grade (with 
probability r), or asked to return to the previous grade (with probability 1 — p — r = q). The 
transition matrix is 


K 1 2 3 
eu! q rp 0 
2) 0 qrp 

? 0 0 q 1l-@q 


For the particular matrix, 
K(03 0.7 O 0) 


1/01 02 07 0 
~ 210 O1 02 07 
3\0 O O01 09 


we find the fixed vector to be (0.0025, 0.0175, 0.1225, 0.8575). 
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EXERCISES 6.5 


1. Show that an by n doubly stochastic transition matrix has a fixed vector each of whose 
entries is 1/n. 


2. A family on vacation goes either camping or to a theme park. If the family camped 1 
year, it goes camping again the next year with probability 0.7; if it went to a theme park 
1 year it goes camping the next year with probability 0.4. Show that the process is a 
Markov chain. In the long run how, often does the family go camping? 


3. Voters often change their party affiliation in subsequent elections. In a certain district, 
Republicans remain Republicans for the next election with probability 0.8. Democrats 
stay with their party with probability 0.9. Show that the process is a Markov chain and 
find the fixed vector. 

4. A small manufacturing company has two boxes of parts. Box I has five good parts in 
it while box II has 6 good parts in it. There is one defective part, which initially is in 
the first box. A part is drawn out from box I and put into box II; on the second draw, a 
part is drawn from box II and put into box I. After five draws, what is the probability 
that the defective part is in the first box? 

5. Electrical usage during a summer month can be classified as “normal,” “high,” or 
“low.” Weather conditions often make this level of usage change according to the fol- 
lowing matrix: 


= 


NIE WIN BIW S 


x 
MIN WlR Ale = 
sl- ala Bl- 


mM 


Find the fixed vector for this Markov chain and interpret its meaning. 


6. A local stock either gains value (+), remains the same (0), or loses value (—) during a 
trading day according to the following matrix: 


0 = 
iid 
3° 3 3 
1 1 
Ol- O sh 
2 2 
_{i iol 
1 2 


If you were to bet on the stock’s performance tomorrow, how would you bet? 


7. Show that the fixed vector for the transition matrix 


p. l=p 
r l-r 


whereO0<p<1l, O<r<il,andl—-p+r¢Ois 
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ee ee 
l-p+rl—-p+r/)- 
8. Alter the gambler’s ruin situation in Example 7.4.5 as follows. Suppose that if a gambler 
is ruined, the opposing player returns $2 to him so that the game can go on (in fact, 


it can now go on forever!). Show the transition matrix if the probability of a gambler 
winning a game is 2/3. If the matrix has a fixed point, find it. 


9. The states in a Markov chain that is called cyclical are 0, 1, 2, 3, 4, and 5. If the chain 
is in state 0, the next state can be 5 or 2, and if the chain is in state 5 it can go to state 
0 or 4. Show the transition matrix for this chain with probability 1/2 of moving from 
one state to another possible state. If the matrix approaches a steady state, find it. 


CHAPTER REVIEW 


This chapter considers two primary topics, recursions and Markov chains. 

Recursions are used when it is possible to express one probability, as a function of 
some variable, say n, in terms of other probabilities as functions of that same variable, 
n. In Example 6.1.2, we tossed a loaded coin until it came up heads twice in a row. If 
a, represents the probability that HH occurs for the first time at the nth toss, then a, = 
An + PGAn_r, N= 3, with a, = 0 and a, = p’. Values of a,, can easily be found using a 
computer algebra systems. Frequently, such systems will also solve recursions, producing 
formulas for the variable as a function of n. We showed an algebraic technique for solving 
recursions involving a characteristic equation, and homogeneous and particular solutions. 

Generating functions associated with a recursion, such as G(s) = yas -s”, were 
also considered. These are often of use when recursions are not easily found directly. We 
illustrated how to find a recursion for the event “THT occurs at the nth trial” and a generating 
function, U(s), for the probability that THT occurs at the nth trial. The generating function 
for first-time occurrences, F(s), is simply related to that of U(s): 


ja. 
F(s) =1 TG 


We then showed how to find a recursion for first-time occurrences, given F(s). 


When the events in question form a probability distribution, means and variances can 
be determined from recursions. For example, if the recursion is 


ay, =f- G1 t+8° Ay» NZ 2 


with initial values ag and a,, and where f and g are constants, then 


> na, =f . Via —-Il)+ I)Ja,_1 +g: IG —2)+ 2]a,_9 
n=2 n=2 n=2 


from which it follows that 
a, +f(1 — do) + 2g 
SS a ee 


Variances can also be determined from the recursion. 


EN] = 
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Markov chains arise when a process, consisting of a series of trials, can be regarded at 
any time as being in a particular state. Usually, the matrix T = [p,;] where the p,; represent 
the probability that the process goes from state i to state j is called a transition matrix. The 
transition matrix clearly gives all the information needed about the process. 

Entries of 1 in T indicate absorbing states, that is, states which once entered cannot be 
left. It is possible to partition the transition matrix for an absorbing chain as 


I O 
r=(4 2). 
The matrix J — Q is called the fundamental matrix for the Markoy chain. The entries in 


(I — Q)~! give the average number of times the process is in state j, given that it started in 
state i. 


PROBLEMS FOR REVIEW 


Exercises 6.2 # 2, 3, 4, 6,9 
Exercises 6.3 #1, 2 
Exercises 6.4 #1, 3 
Exercises 6.5 #2, 4, 6 


SUPPLEMENTARY EXERCISES FOR CHAPTER 6 


1. Consider a sequence of Bernoulli trials with a probability p of success. 


(a) Find a recursion giving the probability u,, that the number of successes in n trials 
is divisible by 3. 

(b) Find a recursion giving the probability that when the number of successes in n 
trials is divided by 3, the remainder is 1. [Hint: Write a system of three recursions 
involving u,,, the probability that the number of successes is divisible by 3; v,,, the 
probability that the number of successes leaves a remainder of 1 when divided by 
3; and w,,, the probability that the number of successes leaves a remainder of 2 
when divided by 3.] 


2. Find a recursion for the probability g,, that there is no run of three successes in n 
Bernoulli trials where the probability of success at any trial is 1/2. 


3. Find the probability of an even number of successes in n Bernoulli trials where p is the 
probability of success at a single trial. 


4. Find the probability that no two successive heads occur when a coin, loaded to come 
up heads with probability p, is tossed 12 times. 


5. A loaded coin, whose probability of coming up heads at a single toss is p, is tossed anda 
running count of the heads and tails is kept. Show that if u,, = P(heads and tails count is 


equal at toss 27), then u,, = eg) p"q". Then find the probability that the heads and tails 
1 


count is equal for the first time at trial 2n. (The binomial expansion of (1 — Apqs) 2 
will help.) 
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6. Automobile buyers of brands A, B, and C stay or change brands according to the fol- 
lowing matrix: 


ABC 
Ae a ee 
5 10 10 
ge 2 bk 
6 3 6 
1 1 3 
Cla. 2 2 
8 8 4 


After several years, what share of the market does brand C have? 


7. A baseball pitcher throws curves (C), sliders (S), and fast balls (F’). He changes pitches 
with the following probabilities: 


Q 
nA 
=| 


S| WIN Ale 
MW Wl Ble 


The next batter hits fast balls well. Should he be replaced in the line up? 


8. A small town has two supermarkets, K and C. A shopper who last shopped at K is as 
likely as not to return there on the next shopping trip. However, if a shopper shopped 
at C, the probability is 2/3 that K will be chosen for the next shopping trip. What 
proportion of the time does the shopper shop at K? 


9. Two players, A and B, play chess according to the following rule: the winner of a game 
plays the white pieces on the next game. If the probability of winning with the white 
pieces is p for either player and if A plays the white pieces on the first game, what is 
the probability that A wins the nth game? 
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Some Challenging Problems 


Five problems, or groups of problems, are introduced here. The intention is for the reader 
to investigate, verify, and add to or extend these problems. In many cases, results are stated 
without proof, and in these cases proofs may be difficult. Mathematica has been used widely 
and this tool, or a computer equivalent, will be very useful in achieving these goals. 


7.1. My Socks and x 


I have 7 pairs of socks in a drawer. I do not sort them by pairs, so they are randomly dis- 


tributed in the drawer. I select socks at random until I have a pair. The probability I get a 
1 


pair in 2 drawings is = since the first sock can be any one of 14 socks and the 


3 3 
second sock must be the other sock in the pair represented by the first sock. 
The probability it takes 3 draws to get a pair is — -— - 2. = since the first sock 


14 13 12 13 
can be any one of the 14 socks in the drawer, the second sock must not match the first, and 


the third sock can match either of the first two socks drawn. 

In a similar way, we can find the probability distribution of the random variable X, the 
number of draws it takes to get a pair. The probability distribution is shown in the following 
table: 

x 2 3 4 5 6 7 8 


Pome cs 
YB BB 143 143 429 143 49 


2. 30. 32. 8% 16. 16 
13 + 743 + Jag + ga9 + Ta3 + agg — | asit shouldbe. 


The sum of the probabilities is = “+ 
We can also compute " 


1 2 30 32 80 16 16 2048 
BUX) 20s 43 6 Sed Sa he SS ae SS ER Se Se 
x] Bt tas t? yaa °° a9 t Taa3 +? 209 = 09 
= 4.7739 
and 
4 ee 30 32 80 16 16 
Bla P oes ae ee ee ee 
e) Bt a 143 ¢ 143 °° ‘497 143° * 499 
_ 10822 
~ 429 


Probability: An Introduction with Statistical Applications, Second Edition. John J. Kinney. 
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so that 
10822 /2048\* 448334 
Ve X] = EIX?] - EIXP = = - (==) = = 2.4361 
arX] = EIX'|— ELAN = “fog — \a39 184041 
_ 214 _ 2048 _ a8] 
We also note here that E[X] = () =e and that E[X] = T741/21’ where I refers to the 
i) 
2n)! 

Euler gamma function. In general, [[n + 1/2] = ‘ ee and ['[n] = (n— 1)!. 


Now we generalize the problem to n pairs of socks in the drawer. It is fairly easy to see 


that 
_ 2n(2n — 2)(2n — 4)- ++ (2n — 2)[2n — 2(x — 2)] &— 1) 


PS 2n(Qan— 1) Qn—2)+++2n—(&— VD) 


and with some simplification this becomes 


2-1.(¢— 1) e *) (n—x)! 
PX =x)= for x= 2,3, ... ,n4+1. 


(7") -(n—x+1)! 


The factorials are purposely not simplified so that the formula will avoid a division by zero 
for x = n+ 1. If the factorials are simplified, then it is easy to see that 


oH 
2n 
n 


2-1-1) ( cs) (n-x)! _ 


P(X=n+l= 


This is a probability distribution since a a 
° ( n) (nxt)! 

If a computer algebra system, such as Mathematica, is available, then computation for 
any value of 7 is easy and graphs can be drawn. Here, for example, is the graph of P(X = x) 


for n = 100. 


Probability 
r oe 


0.04 j 
0.03 [ 
0.02 [ ° 


0.01 f+ 


20 40 60 80 100 
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It appears that the maximum probability is near 15 or so. Here are values of x and 
P(X = x) for values near 15: 


{12., 0.043533}, {13.,0.0449645}, {14.,0.0458461}, 
{15., 0.0461874}, {16., 0.0460091}, {17., 0.0453423}, 
so the maximum is indeed at 15. 


To determine the maximum in general, it is easiest to use a recursion. After some sim- 
plification, we find that 


_ — . 2xnt+1—x%) 
P(X =x4+1)/P(X =x = Qn-na-b ee. 
Solvi 
pene 2x(nt1—x) _ 
Qn=x0-—1) 


we find that the maximum occurs near s(1 + 1+ 8n), which gives 14.651 when n = 100. 


7.2 EXPECTED VALUE 


The expected value pas x + P(X =x) does not simplify easily. Mathematica simplifies 
this as | + Hypergeometric2F1[1, 1 — n, 1 — 2n,2]. Hypergeometric functions are related 
to probabilities encountered with the hypergeometric probability distribution and are gen- 
erally quite difficult to deal with. 

Fortunately, it is possible to expand the hypergeometric function above to the series 


(2-—2n) 41-—n)\(2—-n) 81 — n)(2 — n)(3 —n) 
1-—2n (1 —2n)(2-—2n) (1 — 2n)(2 — 2n)(3 — 2n) 


16(1 — n)(2 — n)(3 — n)\(4 = n) 32(1 — n)\(2 — n)(3 — n)\(4 — n)\(5 — n) 
(1 — 2n)(2 — 2n)(3 — 2nX4—2n)” (1 — 2n)(2 — 2n)(B — 2n\(4 — 2n)(5—2n) 


Interestingly, this can be expressed in terms of Pochhammer functions where 


Pochhammer[a, n] = a(a+ 1)(a+2)---(at+n-— 1). 


k = 
We then see that 1 + Hypergeometric2F 1[1, 1 — 1,1 — 2n,2] = 14+ 1 en 


which may make the expression appear easier, but in fact is just as complicated as the 
original. One should be cautioned though that in computation, enough terms be taken so 
that an infinite number of terms are 0. For example, if m = 7, then 7 terms must be used. 
The result is then always a rational number. 
Here are some values of n followed by the expected values: 
{2, 8/3}, {3, 16/5}, {4, 128/35}, {5, 256/63}, {6, 1024/231}, 
{7, 2048/429}, {8, 32768/6435}, {9,65536/12155}, {10, 262 144/46189}, 


{11,524288/88 179}, { 12, 4 194304 /676 039}. 
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One notices that the numerators of the expected values are each powers of 2, while the 
denominators are all odd. This suggests that some factors of 2 have been divided into the 
numerators, resulting in their simplification. In addition, the denominators almost always 
can be factored into prime factors that occur only once, suggesting that they arose from a 
binomial coefficient. Restoring those factors of 2 gives a surprising result. To take as an 
example, for n = 10, the expected value is 


22144 2. . 2. 
46189 46189 184756 20\ 
10 
Other expected values follow a similar pattern and we see, for n pairs of socks in the drawer, 
that 5 
E[X] = z : 
2n 
n 
200 
= 17.747. 


For n = 100, the expected value is 2) 


100 
Here is a graph of the expected values for n = 2,3, ... , 100: 


Mean 


15; 


10; 


While these means appear to be curvilinear, a straight line approximation gives a 
p-value of the order 10~”, so the fit is fantastically good. For example, for n = 20 the 


expected value is 
2% (274877906944 


40\ 34461632205 
20 


= 7.9763 


while the straight line fit gives 5.16173 + 0.13763(20) = 7.9143. 
2?"P(n+1/2) 


/al(n+l) 


_ /a0(n + 1) 
a Teepe) 


2n 
n 


Since ( ) can be expressed as , it follows that 
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7.3 VARIANCE 


The variance is equally difficult. 
Mathematica simplifies E[X*]as 


= Hypergeometric PFQ[{3, 3, 1 — n}, {2,2 — 2n}, 2], 
= 


which can be expanded in a power series as 


91-n) 481-n2-n) 20001 —n)\(2-—n)B—-n) 


eer (2—2n)(3—2n) | (2 — 2n\(3 — 2n)(4 — 2n) 


720(1 — n)(2 — n)(3 — n)(4 —n) 2352(1 — n)(2 — n)(3 — n)\(4 —n)(5 — n) ee 
(2 — 2n)(3 — 2n)(4—2n)(5—2n) (2 — 2n)(3 — 2n)(4 — 2n)(5 — 2n)(6 — 2n) 


and so the variance can be written as 


k!Pochhammer[2, k] « Pochhammer[2 — 2n, k] @ 


1 > (Pochhammer[3, k])* * Pochhammer[1 — n, k] * 2* Q2n 


k=l 
n 
Some of the variances along with values of n are as follows: 
{2, 2/9}, {3, 14/25}, (4, 1186/1225}, {5,5654/3969}, ... , 
{20, 12352930670782172335394/1187604094232693 162025}, ... 


Here is a graph of a few of the values of E[X]*: 


e(x?) 
45 


40 F 
35 F 
30 F 


4 6 8 10. +12 


It is interesting that values of E[X7] are related, but not simply, to the values of 
E[X]. 
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Here is a graph of the ratio of E[X*] to E[X]. 


Ratio 


40 F we 


f f 1 f “on 
OF 20 40 60 80 100 


117 


A nonlinear least squares fit gives a p-value of the order 10-°"’ so the fit is excellent. 


7.4 OTHER “SOCKS” PROBLEMS 


One mathematical problem always leads to others. Here are some related problems for the 
reader: 


1. There are some red socks and some blue socks in a drawer. How many socks of 
each color must be in the drawer to make the probability of drawing a matching pair 
on the first two drawings 1/2? There are some interesting relationships between the 
values of the red socks as the size of the drawer increases. 


2. If the drawer contains n pairs of socks and suppose that k socks have been drawn. 
Show that the expected number of pairs drawn is (5) /(Qn—-1). 


3. Suppose 7 pairs of socks are in the drawer, but 2 of the pairs are identical yellow 
socks while the other 5 pairs are of different colors. It might be thought that this 
would reduce the expected value of X by 1, but this is not so. Show that E[X] = 
12482/3003 = 4.15651. 


7.5 COUPON COLLECTION AND RELATED PROBLEMS 


A fast food restaurant offers three prizes, one with each purchase. On average, how many 
visits must one make to collect all three prizes? 

How many tosses on average must a fair coin be thrown in order that both heads and 
tails appear for the first time? 

What is the expected number of throws with a fair die so that each of the faces appears 
at least once? 

How many random integers must be produced on average so that for the first time, each 
of the integers 0, 1, 2, ..., 9 has appeared? 

These are all variants of what is known as the Coupon Collector’s Problem. [3], [11], 
[26]. 

We explore this problem here using Mathematica, which sheds considerable light on 
the problem, especially when the number of events to be seen is large. 
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Three Prizes 


Let us begin with a modest number of prizes to be received at a fast food restaurant, say 3. 
There are two approaches to the problem; we will show both of them in this case. 


Permutations 


First consider writing down some of the permutations indicating the order in which the 
prizes are collected. Let the random variable N denote the number of visits necessary to 
collect all the prizes and R the number of prizes to be collected. 


r=3 
If r = 3, then the three prizes are collected in three visits to the restaurant. There are 


six orders in which the prizes may be collected: ABC, ACB, BAC, BCA, CAB, and CBA. 
If the prizes are equally likely to appear, then the probability that N = 3 


P(N = 3) = 6/3? = 2/9 


Now suppose that V = 4. Mathematica can be used to create all the permutations. One of 
the prizes must occur twice, the other once, and the last prize to be collected must occur 
only once. 

To produce the permutations with N = 4, start with one of the 6 permutations of 
A, B, and C, say BAC. Now we preserve the order BAC and add one symbol — B or 
A — (since C must occupy the last place). There are two choices for the place to add the extra 
symbol — following the B or following the A. B can be inserted in two places producing 
BBAC and BABC. Inserting A in these places produces only one order, namely, BAAC. 

So each of the 6 orders of 3 symbols produces 3 orders of 4 symbols, or a total of 18 
orders. 

So P(N = 4) = 18/34 = 2/9. 

The reader may be interested to show that there are 42 distinct permutations for N = 5 
and 90 distinct permutations when N = 6, so 

P(N = 5) = 42/243 = 14/81 and P(N = 6) = 90/729 = 10/81. 

Continuing to count permutations becomes increasingly difficult, since duplications 
must be avoided. It becomes very difficult to change the probabilities with which the prizes 
occur (typically the prizes do not occur with equal probabilities, one prize often being rarer 
than the others). It turns out that there is an alternative method that does not have the dis- 
advantages of counting the permutations and allows us to alter the probabilities with which 
the prizes occur as well. 


An Alternative Approach 


We use the General Addition Law. Suppose that NV = 3 and that each of the prizes occurs 
with probability 1/3. It is easiest to calculate the probability that not all the prizes occur in 
N trials and subtract this from 1. The probability that in n trails, at most two of the three 
prizes occur is 3 * ((2/3)”) but we must subtract from this the probability that only one of 
the three prizes occur, 3 « (1/3)", so the probability that all three prizes occur in 7 trials is 
1-3 * (2/3)" +3 * (1/3)". 
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If we make the function f(n) = 1 —3 « (2/3)" +3 « (1/3)", then we may calcu- 
late f(n) for various values of n to find these probabilities associated with values of 
n: {(3,2/9), (4,4/9), (5, 50/81), (6, 20/27), (7, 602/729), (8, 644/729), (9, 6050/6561), 
(10, 6220/6561)}. 

But this table gives the probabilities that all three prizes occur in n trials or fewer. To 
find the individual probabilities, we must subtract one entry from the previous entry to find 
this table after prepending the entry for three trials: 


{(3, 2/9), (4, 2/9), (5, 14/81), (6, 10/81), (7, 62/729), (8, 14/243), (9, 254/6561), 
(10, 170/6561}). 


The sum of these probabilities is 0.948026 so about 95% of the time the three prizes will 
occur within 10 trials. 


Altering the Probabilities 
It is very easy to alter the probabilities with which the prizes occur with this approach. 
Suppose P(A) = 0.5, P(B) = 0.2, and P(C) = 0.3. 
Then, in general, let 
genprob(n, pa, pb, pc) = (pa + pb)" + (pa + pc)" + (pb + pc)" — pa” — pb" = pc". 


If the three prizes are equally likely, a table of these values is 


{(3, 2/9), (4, 4/9), (5, 50/81), (6, 20/27), (7, 602/729), (8, 644/729), (9, 6050/6561), 
(10, 6220/6561)}, 


which is the result we saw earlier. But now, we can alter the probabilities letting pa = 
0.5, pb = 0.2, and pce = 0.3 to find 


{(3, 0.18), (4, 0.36), (5, 0.507), (6, 0.621), (7, 0.708162), (8, 0.774648), (9, 0.825449), 
(10, 0.864384) }. 


To find the probabilities that all the prizes are won in exactly n + | trials, we subtract the 
probability the event occurs in 7 trials from the probability the event occurs in n + | trials. 


A General Result 


It is evident, using the General Addition Law, that the probability all the r prizes are all 
collected in n trials is 


pin,r) = 1- es :) eae (,"3) ae (" (4) 
=o (;) (=). 
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des ete oes oe ; : r-1\". 
Our reasoning is similar to the reasoning we used in the case r = 3. (24) (=) is 
; 


the probability that at most r — | prizes appear in zn trials; (72) (=) is the probability 


that at most r — 2 prizes appear in 7 trials, and so on. These probabilities must be added or 
subtracted in turn so that the result is exactly r prizes in n trials. 


Then, it is fairly easy to show that p(n, r) — p(n — 1,7) = DT} (vi (73) (=) (+) : 


r F 
giving the probabilities for individual values of n. The probability that n = ris r!/r’, soa 
complete table if we let r = 3 is 


{(3, 2/9), (4, 2/9), (5, 14/81), (6, 10/81), (7, 62/729), (8, 14/243), (9, 254/6561), 
(10, 170/6561), (11, 1022/59049)}, 


which checks our previous result. 
Here are some results for small values of r: 


r= 4:{(4, 3/32), (5, 9/64), (6, 75/512), (7, 135/1024), (8, 903/8192), (9, 1449/16384), 
(10, 9075/131072), (11, 13995/262144), (12, 85503/2097152)} 
r=5:{(5, 24/625), (6, 48/625), (7, 312/3125), (8, 336/3125), (9, 40824/390625), 
(10, 37296 /390625), (11, 163704/1953125), (12, 27984/390625) } 
r = 6:{(6, 5/324), (7, 25/648), (8, 175/2916), (9, 875/11664), (10, 11585/139968), 
(11, 875/10368), (12, 616825/7558272)}. 
Here is a graph for r = 6: 
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Mathematica tells us that the probability that 100 equally likely premiums are collected 
in 200 trials is 4.311 - 107°. 
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Expectations and Variances 


It is easy to calculate means and variances for various values of r. Since the sums used here 
are infinite, we approximate them with a finite number of terms. Exact results will also be 
given later. 

Here are the means for r = 3 and r = 4, taken to 40 terms: 


40 40 
D> n* p(n, 3) = 5.49999 and py n« p(n, 4) = 8.33156 


n=3 n=4 


Mathematica allows us to calculate large values of n. Here is the approximate expected 
number of trials to collect 100 premiums (each equally likely), followed by a graph of the 


probabilities: 
1000 


> n* p(n, 100) = 518.738. 
n=100 
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800 . 
The following table shows that the maximum occurs when n = 460. 


{(457., 0.00378522), (458., 0.00378608), (459., 0.00378652), (460., 0.00378656), 
(461., 0.0037862), (462., 0.00378544) } 


Geometric Distribution 


The coupon collector’s problems are all instances of the geometric probability distribution 
where P(X = x) = pg*!,x = 1,2, ..., where p is the probability of the event we await. 

It can be shown that E[X] = 1/p and Var[x] = q/p’. 

So in the case of r prizes, the first prize is collected on the first visit; the probability 
the next prize is found is (r — 1)/r, so the expected waiting time to collect the next prize 
is r/(r — 1). The next prize occurs with probability (r — 2)/r, and so the expected waiting 
time to collect all the prizes is 


r-1 


exp(r) = 1+ » 
i=l 


r 
r-1 
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And here is a table of some results for small values of r. Note that our approximations 
for r= 3 and r= 4 are quite good: 


{(2., 3.), (3., 5.5), (4., 8.33333), (5., 11.4167), (6., 14.7), 
(7., 18.15), (8., 21.7429), (9., 25.4607), (10., 29.2897)}. 


So the expected waiting time for a fair coin to show both faces is 3, and the expected waiting 
time for a fair die to show all six faces is 14.7. 
The expected waiting time for collecting 100 premiums is 518.738. 


Variances 


Variances are easily calculated by Mathematica using the function ae n’p(n, r) — 
(Shes "p(n, 1). 

We find for r = 3 (using 60 terms in the sum) that the variance is 6.75 with standard 
deviation 2.59808, and we find for r = 4 that the variance is 14.4441 with standard deviation 
3.80053. 

The variances using the geometric distribution are calculated using the function 


r-1 


variance(r) = >» 


i=] 


rxi 
(r— i)? 


Here is a table of standard deviations: 


{(3, 2.59808), (4, 3.80058), (5, 5.01733), (6, 6.2442), 
(7, 7.47851), (8, 8.71849), (9, 9.96295), (10, 11.211)}. 


Waiting for Each of the Integers 


Now we simulate looking for each of the digits 0, 1, ... , 9 for the first time. To investigate 
this, we first create random samples of these integers. We created 50 samples, each of size 
60 (so that the failure of any digit to occur in 60 trials is very small). 

Here is a typical random sample (this is the 23rd sample produced): 

{9,3,7,0,2,9,9,7,7,4,3,5,0,8,0,4, 9,4, 7,0, 7, 1,3, 7,6, 7, 1,3, 8, 6, 
8, 9, 6, 0, 3, 1,0, 2,3, 4, 9, 6, 7, 8, 6, 6, 7, 8, 8,4, 1, 1,0, 3, 1, 7, 2, 4, 7, 3.} 

The digits in order of appearance are 9, 3, 7, 0, 2, 4, 5, 8, 1, and 6 and these occurred 
in positions 1, 2, 3, 4, 5, 10, 12, 22, and 25, respectively, so we had to wait until the 25th 
sample to see all the integers. 

Mathematica can find these positions: 

{{{4}, {13}, {15}, {20}, {34}, {37}, (53}}, ((22}, {27}, (36}, (51), 
{52}, {55}}, {5}, {38}, (S7}}, {2}, {11}, {23}, {28}, {35}, {39}, 
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{54}, {60} }, {{ 10}, {16}, {18}, {40}, {50}, {58h}, ((123 4, {25}, 
{30}, {33}, {42}, {45}, {46} }, ({3}, (8), {9}, {19}, {21}, {24}, 
{26}, {43}, {47}, {56}, {59} }, {14}, {29}, (31), {44}, (48), (493), 

{{1}, {6}, {7}, {17}, (324, {41} 


where the table is read this way: we find that 0 occurred in positions 4, 13, 15, 20, 34, 37, 
and 53 while the integer 1 occurred in positions 22, 27, 36, 51, 52, and 55. But we are 
interested only in the smallest of the first position for any of the digits, and in this case this 
is 25, where 6 was the last digit to occur. 

So for this sample, the waiting time was 25 observations before we saw all 10 digits. 

Next, we found the positions of each integer in every one of the samples (but we sup- 
press the output). 

Since we have all the positions of each of the integers in each of the samples, we need 
the maximum of the positions of the first entries: 


{29, 27, 22, 27, 28, 35, 26, 46, 26, 15, 33, 41, 17, 31, 20, 33, 26, 28, 
30, 27, 27, 17, 25, 28, 45, 30, 23, 46, 33, 50, 24, 22, 21, 26, 
23, 22, 24, 15, 23, 32, 19, 24, 43,22, 18, 27, 16, 25, 30, 27} 
The mean value of these positions is 27.48, which is very close to our theoretical value 


of 29.2897. The standard deviation of these values is 8.19218. 
Here is a bar chart of the maximums of the first entries. 


Digit 


1 5 9 13 17 21 2 29 33 37 41 45 49 


The simulation here sheds great light on a difficult problem. 
Conditional Expectations 


Suppose we have sampled n of the premiums in the coupon collector’s problem. On 
average, how many of the distinct premiums do we have? Let us suppose there are three 
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premiums — A, B, and C and that they occur with equal frequency. Let us also assume that 
if all three premiums are collected, then this does not occur before the nth trial. 

We take a specific example to see how the calculations are done. Let n = 6. Then the 
number of distinct items collected can be 1, 2, or 3. The number of orders in which the pre- 
miums are collected depend on the number of ways in which the integer 6 can be partitioned 
and then the number of permutations of A, B, and C determined by these partitions. 

We find the partitions of 6 into at most three parts: 


{(6), (5, 1), (4, 2), (4, 1, 1), (3, 3), {3, 2, 1), (2, 2, 2)} 


(6) is interpreted as all 6 premiums are the same. There are three ways in which this 
can occur: AAAAAA, BBBBBB, or CCCCCC. 

(5, 1) means that two of the premiums occur, one five times and the other once. There 
are (3) choices for the two premiums, @) choices for one premium to occur five times, and 
(2) ways in which the two premiums can be permuted. 

This gives us (3) * (7) * ‘@) = 36 ways in which this can occur. 

Similarly, the partition (4, 2) produces 


()-()-@- 


Finally, the partition {3, 3} produces 


()-(6)-« 


The partitions of six into three parts must be handled a bit differently since all three 
premiums are collected, but the last premium must complete the set. 

The partition (4, 1, 1) means that the last premium must be one of 3 while the first 5 
premiums can be collected in (?) * () * (3) = 30 ways in which this can occur. Similarly, 
the partition {3, 2, 1} gives (}) « (7) * (3) = 60 ways. 

The partition (2, 2, 2) is impossible since the three premiums would be collected before 
the 6th trial. 

So we have found that 1 premium can be collected in 3 ways, 2 premiums can 
be collected in 36 + 90 + 60 = 186 ways, and three premiums can be collected 


in 30 + 60 = 90 ways giving the expected number of premiums after six trials as 
1*3 +2*186+3%90 = 2.31183 


3+ 186+90 


Other Expected Values 


In an entirely similar way, expected values can be found for other values of n. Here are the 
results which the reader may care to verify: 
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n Expected Value 
3 2.1111 
4 2.2381 
5 2.2889 
6 2.3118 
7 2.4144 


One notices that the expectation increases as n increases, but the increase in the expec- 
tation is modest when n is increased by 1. 

The procedure here becomes increasingly difficult as n increases since the number of 
partitions of the integers increases rapidly. 


Waiting for All the Sums on Two Dice 


The procedure here is very similar to that of waiting for all the integers to appear, but the 
sums on two dice do not occur with equal frequency, so we must alter the sampling scheme 
somewhat. The probabilities of sums 2, 3, 4,5, 6, 7, 8,9, 10, 11, and 12 are 1/36, 2/36, 3/36, 
4/36, 5/36.6/36, 5/36, 4/36, 3/36, 2/36, and 1/36, so we take samples from the following 
set: 


(2,3, 3,4, 4, 4,5,5,5,5, 6, 6, 6, 6, 6, 7, 7, 7, 7,7, 7, 8, 8, 8, 8, 8, 9, 9, 9, 9, 
10, 10, 10, 11, 11, 12). 


So we have taken, but will not show, 100 samples, each of size 300. This will minimize 
the chance that some of the sums will not appear. Here is a typical sample: 


(6, 6, 8, 7,5, 9, 4, 6, 10, 11, 11,7, 8, 11,6, 11, 7,5, 2,7, 10, 3,9, 4,5, 4,9, 7, 10, 6, 
8,9, 8, 8, 12, 8, 2, 8, 6, 6, 9, 7, 6, 6,5, 5,9, 2, 8,5, 11, 10, 7, 3, 10, 8, 8, 7, 10, 7, 
9, 7,9, 8, 12,6, 6,9, 10, 8,9, 4, 7, 7, 7,8, 7, 10, 7, 7, 8, 7,8, 11,5, 6, 7, 4, 3, 7,9, 
9, 8,4, 8, 8, 7, 6,5, 6, 10, 11, 10, 7, 10, 6, 7, 6, 5, 8, 6, 7, 8, 7, 7, 4, 6, 4, 8, 8, 8, 9, 
11, 3, 9, 8, 9, 8, 6, 10, 4, 2, 6, 11, 6, 6, 8, 4, 9, 7, 6, 6, 3, 11, 8, 6,5, 2,9, 8, 3,9, 
11,4, 8, 7, 3, 2, 6, 8, 6, 3, 8, 3, 10,5, 6, 11, 7,4, 7, 10, 6, 12, 6, 9, 7, 7, 8, 6, 7, 6, 
12, 4, 9, 8, 6, 10, 9, 10, 7, 6, 6, 4, 6, 11, 7, 10, 4, 4, 9, 6, 8, 3, 7,5, 6, 3, 7,5, 3,9, 
9,8, 11,8, 7,4, 8, 10,9, 11, 8,7, 8, 7, 7, 8, 2, 3, 3,6, 11,8, 12, 7,3, 7,48, 10, 
6, 6, 8,7, 10,4, 9, 10, 10, 6, 4,9, 7,7, 4,9, 12, 2,7,5,2,5,4, 10, 8, 12, 11, 4, 8, 
10,5, 6, 4, 8,5, 11,9, 11, 11, 7,4, 10, 7, 2,5, 11, 11,7, 9, 6, 8, 8, 3, 6, 3, 6, 4, 9, 10) 


Here are the frequencies with which the sums occurred in this sample: (10, 17, 26, 18, 
46, 49, 48, 31, 26, 22, 7) producing the following bar chart. 
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40 f 


30 | 


20 


So the sample appears to reflect, very approximately, the frequency of each of the sums. 
We want to find out the average number of tosses to obtain each sum, so we looked at the 
positions of each of the sums and then found the maximums of each of the first occurrences 
of each of the sums, but none of the individual samples will not be shown here. 

We need of course only the maximum of the first positions here to see how many dice 
tosses it took to see each of the sums which in this case is 35. Here are the maximums from 
all the samples: 


(29, 168, 77, 33, 27, 44, 29, 35, 164, 35, 114, 31, 25, 27, 43, 66, 33, 42, 53, 83, 100, 99, 35, 
38, 51, 128, 58, 40, 21, 49, 104, 58, 26, 45, 21,58, 227, 78, 42, 165, 33, 85, 25, 32, 45, 
56, 34, 80, 31, 60, 54, 42, 60, 41, 105, 26, 28, 122, 58, 31, 28, 38, 197, 101, 93, 43, 27, 
68, 44, 48, 166, 36, 3450, 50, 37, 31, 40, 76, 62, 117, 57, 41, 66, 29, 56, 71, 32, 61, 30, 36, 
40, 45, 96, 22, 108, 32, 142, 28, 35) 


The mean of this data is 60.62 with a standard deviation of 41.1913. 
Here is a bar chart of this data: 


200 F 
150 4 


100 } 


| 


So the maximums are quite variable. In doing the above procedure 50 times, we found 
that the expected number of tosses to see all the sums is about 61.25. 
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CONCLUSION 


The coupon collector’s problem provides many mathematical challenges and interesting 
ways in which Mathematica can provide insight into the problem and its many facets. The 
reader is encouraged to consult the following references [3, 11, 26] for more information. 


JACKKNIFED REGRESSION AND THE BOOTSTRAP 


We consider here some statistical techniques that are relatively recent with respect to most 
standard statistical procedures and the techniques that are highly dependent on fast and 
capable computer programs. 

The computations for the techniques discussed here were developed using Mathemat- 
ica 10, although other computer programs may be capable of carrying out these procedures 
as well. 


Jackknifed Regression 


With some frequency, data sets show one or more data points that have significant influence 
over the usual least squares regression line and create least squares regression lines that are 
not as representative of the data as they might be. 

Jackknifed regression is a technique that goes through the data set and eliminates 
exactly one data point from the data set at a time and computes the resultant least 
squares line. 

We use an obviously created-for-the-purpose data set from Anscombe [1]: 


x + 5 6 7 8 9 10 11 12 13 14 
y 5.39 5.73 608 642 6.77 7.11 746 7.81 8.15 12.74 8.84 


The 10th data point (13, 12.74) appears to be an outlier. This is obvious from a graph 
of the data along with the least squares regression line shown in Figure 7.1. 

The least squares regression line is Y = 3.00245 + 0.499727X. The analysis of variance 
is shown in Table 7.1. 

The fit is very good, even with the (apparent) outlier included. We will soon show a 
test verifying that the point (13, 12.74) is indeed an outlier. For now, consider the jackknifed 
procedure where exactly one point is omitted from the data set at a time and the resulting 
regression lines are computed. While Mathematica will produce all 11 least squares lines 
and their analyses of variance we show only the lines and analyses when the first, 10th, and 
11th points are omitted. 

Omitting the first point, (4, 5.39) the least squares line is y = 2.71745 + 0.525636x and 
the analysis of variance is shown in Table 7.2. 

Omitting the 10th point, (13, 12.74), the strongly suspected outlier, the least squares 
line, is y = 4.00565 + 0.34539x with analysis of variance shown in Table 7.3. 

This is an astoundingly small p-value. Finally, we omit the last point (14, 8.84), giv- 
ing the least squares line y = 2.46176 + 0.57697x with corresponding analysis of variance 
shown in Table 7.4. 
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Figure 7.1 Data and least squares regression line. 


Table 7.1 
DF SS MS F-Statistic P-Value 
x 1 27.47 27.47 17.9723 0.00217631 
Error |9 = 13.7562 = 1.52847 
Total | 10 41.2262 
Table 7.2 
DF SS MS F-Statistic | P-Value 
x 1 22.7942 22.7942 13.4731 0.00630429 
Error 8 13.5347 1.69183 
Total 9 36.3289 
Table 7.3 
DF SS MS F-Statistic P-Value 
x 1 11.0228 11.0228 1.16069 x 10° 6.17086 x 10-7? 


Error 8 0.000075974 9.49675 x 10° 
Total 9 11.0228 
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DF SS MS F-Statistic P-Value 


x 1 27.4638 27.4638 18.6396 0.00255516 
Error 8 11.7873 1.47341 
Total 9 39.251 


The 11 different lines and analyses of variance give the following p-values: 


PointOmitted (4,5.39) (5,5.73) (6,6.08) (7,6.42) (8,6.77) (9,7.11) 


p — value 0.0063 0.0056 0.0050 0.0045 0.0041 0.0038 
PointOmitted (10,7.46) (11,7.81) (12,8.15) (13,12.74) (14,8.84) 
p — value 0.0036 0.0034 0.0032 6-10-7 0.0026 


It is also interesting to compare the discrepancies between the observed y values and 
the predicted values. First, here are the discrepancies using the least squares line for all the 
points: 


{0.388642, 0.228915, 0.079188, —0.080539, —0.230266, —0.389993, 
—0.53972, —0.689447, —0.849174, 3.2411, —1.15863}. 
The mean value here is 7 - 107°. 
Here are the discrepancies using the least squares line when the point (13, 12.74) is 
omitted: 
{0.00279, —0.0026, 0.00201, —0.00338, 0.00123, —0.00416, 0.00045, 0.00506, 
—0.00033, —0.00111}. 


The mean value here is 4 - 10~°, less than that for the overall least squares line, due to the 
presence of the outlier. 


7.8 COOK’S DISTANCE 


We have been claiming that the point (13, 12.74) is an outlier, as indeed it appears to be 
from Figure 7.1, but we have not offered any substantial mathematical reason for this. R.D. 
Cook [5] has proposed a distance, commonly known as Cook’s d, a quantity computed when 
each of the data points is omitted from the analysis one at a time. The computations go as 
follows: 

For the ith data point, let e; denote the residual at that point and let 


.— x) e h. 
h,= Z + A ) and d,;= * : - 
n ds 2-MSE (1-n,)° 
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for the ith data point, where MSE is the mean square for error in the least squares regression 


analysis of variance when the ith point is omitted. 
(13-9)? _ 13 


For the 10th data point, (13, 12.74), we find that hj) = = + 10 > 38 while 
(12.74 — 9.4989)? = 
CS a ee 1.39285. 
2+ 1.52847 B 
(i=s) 


Generally, the value of d; is regarded as significant if it exceeds the 5Oth percentile of 
the F[2,n — 2] distribution which in this case is 0.743492, so the data point is significant. 
A complete table of Cook’s d for this data set is 


{0.0338173, 0.0069478 1, 0.0005 17641, 0.000354629, 0.00214148, 0.005473 14, 
0.0117646, 0.0259839, 0.0595359, 1.39285, 0.300571}, 


so our suspected influential point is, in fact, the only significant point. 

The values of h; are frequently used by themselves to detect influential points. It is easy 
to verify that }”"_, h; = 2, so the mean value is 2/n. Any value exceeding this is regarded 
as influential. 

In this case, the critical value is then 2/11 = 0.18182. A complete set of values of h; 
is as follows: 


{0.318182, 0.236364, 0.172727, 0.127273, 0.1, 0.0909091, 0.1, 0.127273, 0.172727, 
0.236364, 0.318182}. 


Here one would conclude that several of the points are influential, a conclusion not sup- 
ported by the values of Cook’s d. 


7.9 THE BOOTSTRAP 


It is the common case that one has just one random sample to deal with. While the sam- 
ple may be indicative of the general situation, it is not by itself useful in determining the 
sampling distribution of a statistic. It takes many random samples of size n, for example, 
from a normal distribution with variance o” to determine that the sample mean has standard 
deviation 


n 
Efron Yi0] has developed a clever way to turn one random sample into many. He calls 
the procedure the Bootstrap, analogous to raising ones’ self by his or her bootstraps, a 
physically impossible endeavor, but it turns out to be a very real mathematical one. Here is 
how it works and we take a specific example to illustrate it. 


Example 7.9.1 


Suppose we wish to discover the standard deviation of the median of samples chosen from 
a gamma distribution. This is not something that is commonly known! Figure 7.2 shows 
the population. 
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Probability 


|e rs Se Cm CY i 11 


2 4 6 8 10 
Figure 7.2 A gamma distribution. 


The distribution is decidedly nonnormal. To do the bootstrap, we draw a sample of size 
40 from the distribution. Here are the values in the sample: 

{ 1.51239, 1.22365, 3.47943, 5.86536, 3.3359, 1.72314, 0.130842, 1.62945, 4.51917, 
1.09428, 0.736489, 0.511442, 0.926443, 3.99601, 1.98945, 5.31002, 0.597938, 
3.31717, 1.98591, 1.22595, 1.95226, 1.67722, 1.57653, 1.94384, 1.11182, 

0.273504, 0.27343, 0.306156, 3.70396, 3.32532, 2.51426, 2.87691, 1.42218, 
1.47679, 1.40976, 1.42221, 2.88933, 2.9803, 5.83437, 1.80322}. 
The bootstrap procedure consists of drawing samples from this single sample, creating a 
number of samples — hence the bootstrap. We take 1000 samples of size 20 each (if these 
samples were of size 40, then sampling with replacement must be done), but we do sam- 
pling without replacement. We calculate the median of each sample and get a probability 


distribution for the median. A histogram of the results is shown in Figure 7.3. 
We now find that the mean of these medians is 1.78865 with standard deviation 


0.359114. 
Frequency 
150 + 
100 | 
50+ 
aul T | 
z To 1.5 oar 2.5 rs 35 ea 


Figure 7.3 Medians of bootstrap samples. 
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It is very interesting that we can learn a bit about a (to say the least) statistic of rare 
interest by proceeding from a single sample! 

We show one more example. The reader will no doubt find other examples of this 
technique. 


Example 7.9.2 
Suppose we wish to discover the expected value of the range in samples from the standard 
N(O, 1) distribution. 

AS we proceeded in the median example, we first draw a sample of 100 from the N(0, 1) 
distribution. We show a portion of this sample: 


{0.353945, 0.949031, 0.930671, 0.868072, —1.64129, 1.03884, —0.229624, 0.261774, 
—0.203825, — 1.61538, 1.04087, —0.476678, —0.763087, 1.00335, 2.51053, 
—0.340539, —1.14323, 0.159024, —1.62462, —1.08409, —0.450556, — 1.89815, 


0.618595, 1.23218, 0.96988, ... }. 


Then we selected 1000 samples, each of size 20, from this sample. Here is one of the 
samples: 


{ 1.33482, —1.06213, —0.308785, 0.551384, — 1.61538, 0.261774, —0.965737, 0.969888, 
— 1.25885, 0.515682, 1.42721, —1.43946, —0.229624, —0.203825, 


— 1.30396, 0.969888, 0.596313, —1.18455, —0.571001, 0.618595}. 


The range of each sample was calculated. A histogram of these results is shown in 
Figure 7.4. 


Frequency 
150 
100 F 
50 ¢ 


Figure 7.4 Range of bootstrap samples. 


The expected value of the range is 3.36013 with standard deviation 0.575379. 
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7.10 ON WALDEGRAVE’S PROBLEM 


Waldegrave, an English gentleman, in 1711 or so proposed the following problem: 

r+ 1 players play each other in a game in which either player playing has probability 
1/2 of winning an individual game. They play in order until one player has beaten each 
of the other players consecutively. Players who lose an individual game nevertheless retain 
their position in the playing order. 


Three Players 


7.11 


Consider first the situation in which there are three players, say A, B, and C. Denote by Xy 
that player X defeats player Y and consecutive symbols denoting subsequent games and 
their outcomes. One player must then defeat each of the others consecutively. 

There are only two ways in which the game can end in two trials: ApA¢ or B, Bc. There 
are only two ways in which the game can end in three trials: ApC,Cp or B,CgC,. In fact, 
there are only two ways in which the game can end in n trials. 

Let the random variable N denote the length of the series until a winner is established. 
Suppose the game ends on the nth game. The previous sequence of games can have no two 
successive games won by the same player, so the winner must alternate until the winner of 
game n — | also wins the nth game. Since the first game must be Az or By, there are only 
two ways in which the game can end on the nth game. 


Then P(N =n) =2- (5) = = forn = 2,3, ... and the geometric series 


oo L 
Dsorstatetos *_ — | asit shouldbe. 


E(N) = 2-543: 


= SEW) =2-543-544- Date 
giving 
1 
EN) = SEIN) = 2-5 41-T 41-4. =1+—4) = Fand so E(N) = 3. 
2 


PROBABILITIES OF WINNING 


For three players, it is not difficult to compute the probabilities that each of the players will 
win the game. (We will show another way to do this later.) 
Consider how player A can win the game. 
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For the sequence ApAc, the game lasts two plays. If the sequence AgC,BcApAc occurs, 
A wins in 5 plays. But the initial sequence ApC,Be- can occur any number of times before 
being followed by ApA¢, and so A can win the game this way in 5, 8, 11, 14, .... games. 

A final possibility for A to win the game is that the sequence B,CzAc Ap occurs and 
A wins in four games. And, similar to the previous possibility, the sequence B,CzAc can 
occur any number of times before A beats B for the final game. So A can also win the game 
in 4,7, 10, ... games. 

Putting all this together with the geometric series involved, we see that the probability 
A wins the game is 


1)? 
1-(3) 1-(3) 
Now consider the ways in which B can win the series. B could win in two games by the 
series B,B-. Another possibility is that the sequence B,CpA-B, Bc occurs but the sequence 
B,CpAc can occur any number of times before B,Bc¢ occurs, so B can win in 5,8, 11, ... 
games. 

A final possibility is that the sequence AgC,Bc occurs followed by B,, but again the 
sequence AgC,Bc¢ can occur any number of times before B beats A and so B can win in 
4,7,10, ... games. 

We conclude that the probability B wins the series is exactly the same as the probability 
A wins the series. But this is probably evident since each player has probability 1/2 of 
winning the first game. 


So the probability C wins the series is | — > _3=?2 the sample points for which 
C wins the game are easily found as well. In this case, the game ends in 3, 6,9, ... trials. 
Note that there are two ways in which C can win the game in 3, 6,9, ... trials. 


In the case of three players, the probabilities that each wins the game are not that far 
apart since 2 = 0.35714 and 2 = 0.28571. This is a point we will return to, but for the 
moment consider adding a player to the game. 


MORE THAN THREE PLAYERS 


Adding even a single player makes the game considerably more difficult. However, some 
interesting patterns evolve which we will soon see. Consider the situation for four players. 
Let us write out some sample points for various lengths of the game: 


Length of the game Sample points 

n=3 A,A-Ap or B,B-Bp 

n=4 A,C,C,Cp or B,CpCp Cy 
ApC,BBpB, 

“= AA DD gD 
ByCgDcDDz 
B,BCDgD Dy 


www.it-ebooks.info 


380 Chapter7 Some Challenging Problems 


Length of the game Sample points 


Ap, CyDcApAgAc 
ApAcD,BpB,Bc 
AgC,CpA-ApAg 
By CpDcApAgAc 
BB eDgCpC,Cg 
B,CgCpAcApAg 


n=6 


We could go on, but it is obvious that the situation is much more complex than that for 
three players. Yet there is a pattern here that will make us able to create the sample points 
for any value of N. 

Consider the sample point ApA-Ap for n = 3. Change the winner of the third game, 
and let that winner win the series producing the point AgA-D,DpD¢. Change the winner 
of the third game in the sample point B,B Bp, and let that winner go on to win the series 
to produce the point By BcDgDcD,. 

Consider the sample points for n = 4: 


AgCaCgCp 
BaCpCpCa 


Change the winner of the third game, and let that player win the series to produce the 
points ApC,B-BpB, and ByCzpD¢D,Dz, obtaining all the ways the game can end in five 


trials. 
Similarly, the points for n = 4 and n = 5 can be used to produce all the sample points 
forn = 6. 


Finally, we show how the sample points for n = 5 and n = 6 can be used to produce 
the 10 sample points for n = 7. On the left, we show the sample points for n = 5 andn = 6. 
Change the winner of the fifth game, and let that winner go on to win the series to find the 
sample points in the right-hand column: 


AgC,D-D Dg > AgpC,DcD,BpBsBe 
ApACD,DgDc > ApAcD,DpCpCyCy 
ByCgDcDyDz > ByCgDeD,BpB,Bc 
ByBCDgD (D4 > ByBCDgDcApAgAc 
ApCyDcApAgAc > AgC,DcApB,BcBp 
ApACD,BpByBo > AgAcD,BpApAcAp 
AC, CpAcApAg > ApCyCpAcD,DgDc 
ByCgDcApApAc > ByCgDcApB,BBp 
B,BDgCpCyCg > ByBCDgCpAcAgAD 
ByCgCpAcApAg > ByCgCpAcDsPpD¢ 


The reason the sample points for a series lasting 7 trials is dependent on the trials for 
n—1 and n-— 2 is fairly simple: if the series ends in some player winning the last three 
games, then either the winner of the n — 1 game also wins the nth game or the winner of 
the n — 2 game also wins the last two games. 
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So if we look at the ways the game can end, we find the series 2, 2,4, 6, 10, 16, 26, 
42, .... 
Recall that the Fibonacci series is 1, 1,2, 3,5, 8, 13,21, ... where successive terms are 
found by adding the previous two terms. Our series is exactly similar except that we begin 
with 2 and 2. We will return later to this point. 


r+1 Players 


The argument above can be extended to the case of r+ 1 players. To find the number 
of ways the game can end in n trials, consider the number of ways the game can end in 
n—i trials. Change the winner of the n—r-+ | trial, and let that winner go on to win 
the next r — i trials. So if a(m) denotes the number of ways the game can end in n trials, 
then 


a(n) = a(n— 1) + a(n—2)4+---+an—r+1). 
If p(n) denotes the probability the game ends in zn trials, then 
p(n) = a(n)/2" andso2"p(n) = 2”"!p(n — 1) + 2”? p(n — 2) ++ 42°" p(n — r + Dor 
p(n) = (1/2)p(n— 1) + 1/4)p(n = 2) + + 0/2)" pa — r+ 1). 


This recursion can be easily used with a system such as Mathematica to find the probabilities 
the game ends in any number of trials for any number of players. 

For seven players, here are the number of trials in which the game ends followed by 
their probabilities: 


n Probability 

5 = = 0.03125 
6 = = 0.015625 
7 = = 0.015625 
8 = = 0.015625 
9 = = 0.015625 
10 = = 0.015625 
1 = = 0.015137 
12 = = 0.014893 
13 —, = 0.014648 
14 a = 0.014404 
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A graph of these probabilities is as follows. 


Probability 
0.019 - 


0.018 - 
0.017 E 
0.016 | 
0.015 


0.014 + 


1 i -—n 


5 10 15 


As r increases, one finds graphs very similar to that earlier. 

Here are two probabilities that would be very difficult to do without a computer alge- 
bra system. The probability the game lasts n trials of course decreases as n increases. The 
probability the game lasts 150 trials is 


1989 1797973985997948 1 147978777 1439644489521 


— = 0,00139372. 
142724769270595988 1058285969449495 136382746624 


With 20 players, the probability the game ends in 30 trials is “aX ='1,90735 x 10°. 


Probabilities of Each Player 


We found that the probability A wins the series is equal to the probability B wins the series 
when there are three players. It is obvious that, since the game is fair, the probability A 
wins is the same as the probability B wins no matter the number of players. Finding the 
probability of a given player of winning the series is considerably more difficult than in the 
case of three players. Bernoulli proved, numbering the players now and letting p; denote 
the probability that player i wins the series, that 


P| =Pp2 and pj, = ; for i= 2,3, ... ,r where there are players in the game. 


—a 
T+ 20? 


We find then the following probabilities for some values of r: 


r Pa Ps Pc Pp PE Pr 
2 cs & 
HS 15 14 
30 OL si n os 
298 298 298 298 
4 4913 4913 4624 4352 4096 
22898 22898 22898 22898 22898 
5 1185921 1185921 1149984 1115136 1081344 1048576 
6766882 6766882 6766882 6766882 6766882 6766882 


www.it-ebooks.info 


7.12 More than Three Players 383 


Since the fractions in the above-mentioned table are not simplified, it is clear that as the 
number of players increases, the probability that any one of them wins the series approaches 


on = 
i 142" 
TT approaches | fairly rapidly. For 100 players, the probability that any one of them 


1 — 
5 


wins the series is very close to 0.01. For a proof of Bernoulli’s result, see Hald [17], p. 378ff. 


aa That is also obvious since the factor relating successive probabilities, 


Expected Length of the Series 


The expected length of the series for more than three players also becomes quite difficult. 
If we look at the expected length of the series for four players, we find that 
1\> iv? 15? £\° iy? 

EIN|=3-2-(5) +4-2-(4) +5-4-(3) +6-6-(5) +7-10-(3) +8. 
16- (5) +--+, achallenging series to add to say the least. 

Mathematica, however, finds the sum to be 3 after using about 80 terms in the series, 
so the series is not only difficult, but it converges very slowly. 

For five players, the series converges to 15, and for six players, the series converges to 
31. We conjecture, and offer no proof of this, that 


E[N] = 2’ — 1 for r+ 1 players. 


Fibonacci Series 


For four players, we found the Fibonacci-like series 2, 2,4, 6, 10, 16,26, ..., where after 
starting with 2, 2 we find successive terms by adding the previous two terms. We also found 
that the number of points in the sample space follows this sequence. 

It is interesting to note that this is twice the usual Fibonacci series and is exactly the 
series one gets if we consider flipping a fair coin and waiting for two heads, HH, in a row. 


HH 
THH 


HTHH 
TTHH 


HTTHH 
THTHH 
TTTHH 


The sample space in that case is where the number of sample points in each successive 
sequence is found by adding the number of points in the previous two sequences, producing 
the usual Fibonacci series. The reason for this is that if the series is to end in HH, inn tosses, 
it must either be preceded by HT followed by HH in the remaining n — | tosses, or that it 
begins with T followed by HH in n — 2 tosses. This is very similar to our reasoning in the 
Waldegrave problem. 
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7.13 


7.14 


7.15 


For more than four players in the Waldegrave problem, or for waiting for three or more 
heads in a row, we find “super Fibonacci” series where we start with 1, 1,2 and add the 
previous three terms to arrive at the next term to find the number of sample points for a 
given length of the series. 


CONCLUSION 


The Waldegrave problem, for more than three players, presents some very nontrivial ana- 
lytical mathematical questions, but we can do calculation with a computer system such as 
Mathematica. 


ON HUYGEN’S FIRST PROBLEM 


Christian Huygens proposed the following problem: player A wins if he casts a sum of 
6 before his opponent, player B, casts a sum of 7 with two fair dice. The problem was 
considered by many mathematicians, including Fermat and James Bernoulli. 

However, various scholars studying the problem presumed quite different orders of 
play, and these orders of play greatly influence the answer to the question. We will consider 
several different orders of play and the probability that A will win the game. 

First, Huygens assumed that the order of play would be ABBAABBAA .... To generalize 
the problem a bit, suppose that the probability A wins at a particular trial is p,; and the 
consequent probability that A loses at a particular trial is 1 — p, = q, and the probability 
that B wins at a particular trial as p, with gq, = 1 — pp. 

A can win the game in two mutually exclusive ways: A can win on trials 1,4, 8,12 ... 
or A can win on trials 5,9, 13, .... 

The probability A wins on the first sequence is 


2 
14 . 
PR=PEF+UGPt+UGUGA+UNUGUEDGP +t =P ! + ge while 
172 
the probability A wins on the second sequence is P, = CUP) +UNGUEUP! + 
22. og 
4 
NBDE DUP + =P | | 
Ty 
: l+a145 
Then P(A wins) = P; + P, =p, 7) 
I-94 
If we let p, = 5/36 and p, = 1/6, we find P(A wins) = >> = 0.45756. 
1140/23 _ 


If the game is fair so that p,; = py = 1/2, then P(A wins) = = = 3, giving, for 


2 1-(1/2)4 
these probabilities, the advantage clearly to A. 


CHANGING THE SUMS FOR THE PLAYERS 


It is possible to compute the probabilities that player A casts a sum of a before player B 
casts a sum of b fora or b = 2,3, ... , 12. 


The following table gives all these probabilities and is to be read this way: if a= 4 
26123 _ 


and b = 6, the probability of A shooting a sum of 4 before B shoots a sum of 6 is 70345 = 


0.37137. 
Note that the probabilities for any value of b is the same for values of a summing to 14 
since the sum of the tops and the bottoms of two dice total 14. 
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Decimal Equivalents 


a=2or 12: {0.500198, 0.330591, 0.245293, 0.19401, 0.159818, 0.135425, 0.159818, 
0.19401, 0.245293, 0.330591, 0.500198 } 

a=3 or 11: {0.670214, 0.500816, 0.397792, 0.328601, 0.278987, 0.241721, 
0.278987, 0.328601, 0.397792, 0.500816, 0.670214} 

a=4 or 10: {0.755931, 0.604689, 0.501887, 0.427551, 0.371366, 0.327466, 
0.371366, 0.427551, 0.501887, 0.604689, 0.755931} 

a=Sor9: {0.807646, 0.674755, 0.577551, 0.503448, 0.44516, 0.398176, 0.44516, 
0.503448, 0.577551, 0.674755, 0.807646 } 

a=6 or 8: {0.842282, 0.725268, 0.635102, 0.563581, 0.505538, 0.457558, 
0.505538, 0.563581, 0.635102, 0.725268, 0.842282} 

a=7: {0.867132, 0.76346, 0.680408, 0.612462, 0.555919, 0.508197, 0.555919, 
0.612462, 0.680408, 0.76346, 0.867132} 

Here is a plot of the probabilities for various values of b if a = 5. 


Probability 
0.6 F 


0.4 > 


0.3 5 


A contour plot is also interesting: 


0.8 
Probability 9-6 
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Another order 


If we change the order of play to ABABAB.... , then 


P\ 


P(A wins) = Py + 919P1 + 1HNP + -. = T= 
— 192 


If we let p, = 5/36 and p, = 1/6, we find P(Awins) = OL eSa = 0.491803 . 


Were the game fair, then P(Awins) = Tz = 
One could of course calculate a table similar to that earlier for all possible values of a 
and b. 


Bernoulli's Sequence 


Bernoulli proposed the sequence ABBAABBBAAABBBBAAAABBBBB ... , which makes the 
problem much more difficult. 


We look at some possibilities until we see a pattern. 
First, A can win on trials 1,3,7,13,21, ... and this has probability 


k(k 


(oe) 
+1) 
Pi =Py +4?) +9429, @P1 +1015, DP 1 per > (4192) 2 Py 
k=0 


A could also win on trials 4, 8,14, 22, ... and this has probability 


co 
Kk+1) 
Po = hOUPi t+ ADGGUP1 t+ USGGHUnDGUP: += YD) > UP 
k=1 


And the probability A wins on trials 9,15,23,... is 


foe) 
k(k+1) 
Ps = 1 HhHGP1 + VDLGULDBUP! + UDLGULDUDUP! = Dy (UG) > GP 
k=2 
and one could go on but the pattern is evident. 
G+) | 
We see then that P(A wins) = Yie29 i, (qid2) 2 qP1- 
The series converges, but quite a lot of arithmetic is involved. Taking 10 terms in the 
series gives 
P(Awins) = 0.490093, and this is the same result as taking 30 terms in the series. 


Were the game fair, then P(A wins) = 3, so the last two series give exactly the same 
probability that A wins the game. 
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Appendix A 


Use of Mathematica in Probability 
and Statistics 


Reference is often made in the text to a computer algebra system. Mathematica was used 
for much of the work in this book, but other computer algebra systems are also capable 
of doing much of the work we do. We give here examples of the use of all the commands 
in Mathematica that have been used in the examples and graphs in the text, but we do not 
show the creation of every graph. In addition, no attempt is made to carry out our tasks in the 
most efficient manner; the reader will find that Mathematica offers many paths to the same 
result; the reader is encouraged to explore other ways to achieve the results shown here. 

The material here is referred to by chapter in the text and by examples within that 
chapter. We often do not repeat the conditions of the examples, so the reader should read 
the text before studying the solutions. No attempt is made here to explain Mathematica 
syntax; the reader is directed to the extensive help pages for each of the commands we 
show here. 

The text contains many graphs which are not reproduced here. Entries in Mathematica 
appear in bold-face type; the responses follow in ordinary type. 

A simple calculation is necessary to load the Mathematica kernel. Then all the com- 
mands shown here will work as shown; no other knowledge in experience with Mathematica 
is necessary. This appendix is in actuality a Mathematica notebook and will run on a com- 
puter exactly as shown here with the exception of examples which use random samples; 
they will vary each time the program is run. 


CHAPTER ONE 


Section 1.1 Discrete Sample Spaces 


In[1]= 
In[2]= 
Out [2]= 
In[3]= 
Out [3]= 
In[4]= 
Out [4]= 
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The Fibonacci recursion is a,= a,_,+ a,_>, a,= 1; a,= 1.Values for the 
recursion can be found directly using the recursion. 
a[n_] := a[n] = a[n - 1] + a[n - 2] 


afl] = 1 
1 
a[2] = 1 
1 


Table[a[n], {n, 1, 15}] 
{1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610} 


Probability: An Introduction with Statistical Applications, Second Edition. John J. Kinney. 
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In[5]= 
Out [5]= 
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The 25th Fibonacci number is 


a[25] 
75 025 


Section 1.4 Conditional Probability and Independence 


Example 1.4.4 


391 


Figure 1.9 shows the graph of P(A|T+) as a function of P(A). It was drawn as follows: 


In[6]= f[p_] := (20 000 p) / (1 + 19 999 p) 

In[7]= Plot[f[p], {p, 0, 1}, Frame — True, FrameLabel — {"P(A)", "P(A|T+)"}, 
LabelStyle — FontFamily — "Helvetica-Bold"}] 

Out[7]= 4.0000 F 7 

0.9999 | 1 
0.9998 F 1 
hd 
Zz 0.9997 § j 
ao 
0.9996 - 1 
0.9995 1 
0.9994 f, ‘ ; : ; J 
0.0 0.2 0.4 0.6 0.8 1.0 
P(A) 
Figure 1.10 was drawn as follows: 

In[8]= Plot[(0.95 * r) / (0.95 + .90 * r), fr, 0, 1}, 
AxesLabel > {"r=P(A)", "PC(A|T+)", "P(T+[A)"}, 
LabelStyle — {FontFamily — "Helvetica-Bold")] 

Out [8]= 

[8] é 
n 1 n 1 n 1 1 1 r 
0.2 0.4 0.6 0.8 1.0 


www.it-ebooks.info 


392 Appendix A Use of Mathematica in Probability and Statistics 


This section also shows a three-dimensional graph of P(A |T+) as a function of both the 
incidence rate of the disease, r, as well as p = P (T + | A). This was done as follows: 


In{9]= f[r_,pJ :=r*p/(r*p+(-r)* C1 - p)) 

In[10]= Plot3D[f [r, pl , {r, 0, 1}, {p, 0, 1}, 
AxesLabel > {"r=P(A)", "PCT+|A)", "PCA|T + )"}, ViewPoint -> {0.965, 
-2.553, 2.000}, 
LabelStyle > (FontFamily > "Helvetica-Bold"), PlotPoints — 50] 


Out [10]= 1.0 


ff 


P(AIT+) 5 


0.0L 


1.0 


Example 1.5.1 The Birthday Problem 


The table with exact values of P(A) was constructed with this instruction: 


In[{11]= probs = Table [{i, 1 - Product [(366 - r) / 365, {r, i}]} , {i, 1, 40}]//N 
Out[11]= {{1., 0.}, {2., 0.00273973}, {3., 0.00820417}, {4., 0.0163559}, 

{5., 0.0271356}, {6., 0.0404625}, {7., 0.0562357}, {8., 0.0743353}, 

{9., 0.0946238}, {10., 0.116948}, {11., 0.141141}, {12., 0.167025}, 


{13., 0.19441}, {14., 0.223103}, {15., 0.252901}, {16., 0.283604}, 
{17., 0.315008}, {18., 0.346911}, {19., 0.379119}, {20., 0.411438}, 
{21., 0.443688}, {22., 0.475695}, {23., 0.507297}, {24., 0.538344}, 
{25., 0.5687}, {26., 0.598241}, {27., 0.626859}, {28., 0.654461}, 

{29., 0.680969}, {30., 0.706316}, {31., 0.730455}, {32., 0.753348}, 
{33., 0.774972}, {34., 0.795317}, {35., 0.814383}, {36., 0.832182}, 
{37., 0.848734}, {38., 0.864068}, {39., 0.87822}, {40., 0.891232}} 


The graph in Figure 1.13 was drawn with these commands: 


In[12]= values = Table[i, {i, 1, 40}] 

Out[12]= {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 
39, 40} 

In[13]= ListPlot[probs, Frame — True, FrameLabel — {"n", "Probability"}, Ticks > 
{values, Automatic}, PlotLabel — ("Birthday Problem"), LabelStyle > 
(FontFamily > "Helvetica-Bold")] 
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Out[13]= Birthday problem 
T T T T T °° 
0.8} oor” | 
e® 
e 
e® 
e° 
> 0.6 > °° 1 
=| e 
3 - 
S 0.4} an | 
oO °° 
e* 
0.25 e° J 
e 
eo? 
ee” 
Ou PTY hoe 1 Ll l ii 
0 10 20 30 40 
n 
Section 1.7 Counting Techniques 
Mathematica does exact arithmetic: 
In[14]= 52 ! 
Out[14]= 80 658 175 170 943 878 571 660 636 856 403 766 975 289 505 440 883 277 
824 000 000 000 000 
This may convince the reader that a random order of a deck of cards is truly rare! 
The number of permutations of r objects selected from n distinct objects is n!/(n — r)!. 
For example, the number of distinct arrangements of 30 objects chosen from 56 objects is 
56!/26!. Mathematica will evaluate this exactly. 
In[15]= 56! / 26! 
Out[15]= 1 762 989 441 479 047 465 097 977 043 769 075 758 530 560 000 000 
Here are some examples of permutations. 
In[16]= Permutations [{ a, b, c}] 
Out[16]= {{a, b, c}, fa, c, b}, {b, a, c}, {b, c, a}, {c, a, b}, fc, b, a}} 
If some of the objects are alike, only the distinct permutations are returned: 
In[17]= perms = Permutations[{a, a, b, b, c}] 
Out[17]= {{a, a, b, b, c}, fa, a, b, c, b}, fa, a, c, b, b}, fa, b, a, b, c}, 
fa, b, a, c, b}, fa, b, b, a, c}, fa, b, b, c, a}, fa, b, c, a, b}, 
fa, b, c, b, a}, fa, c, a, b, b}, fa, c, b, a, b}, fa, c, b, b, a}, 
{b, a, a, b, c}, {b, a, a, c, b}, {b, a, b, a, c}, {b, a, b, c, a}, 
{b, a, c, a, b}, {b, a, c, b, a}, {b, b, a, a, c}, {b, b, a, c, a}, 
{b, b, c, a, a}, {b, c, a, a, b}, {b, c, a, b, a}, {b, c, b, a, a}, 
{c, a, a, b, b}, {c, a, b, a, b}, {c, a, b, b, a}, {c, b, a, a, b}, 
{c, b, a, b, a}, {c, b, b, a, a}} 
In[18]= Length[ perms] 
Out[18]= 30 
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In[19]= 
Out [19]= 


In[20]= 
Out [20]= 


In[21]= 
Out[21]= 


In[22]= 


Out [22]= 


In[23]= 
Out [23]= 


In[24]= 
Out [24]= 


In[25]= 
Out [25]= 


We can check that this is 


5! / (2! 2! 1!) 
30 


A sample of 3 random permutations of the letters in the set {a, a, b, b, c} can be found as 
follows: 


RandomChoice[Permutations[{a, a, b, b, c}], 3] 
{{a, b, a, b, c}, fa, c, b, b, a}, fa, c, b, b, a}} 


The number of combinations are found using the Binomial [n, r] = n!/(r!\(n — r)!) func- 
tion. The number of distinct poker hands is: 


Binomial[52, 5] 
2 598 960 


Example 1.7.2 


Binomial [3, 1] * Binomial [8, 4] / Binomial [11, 5] 


5 


TT 
Example 1.7.3 The Matching Problem 


If there are 4 numbers to be permuted, we can generate all 4! = 24 permutations of these 4 
digits as follows: 


Permutations[{1, 2, 3, 4}] 

{{1, 2, 3, 4}, {1, 2, 4, 3}, {1, 3, 2, 43, {1, 3, 4, 2}, {1, 4, 2, 3}, 
{1, 4, 3 2), 42, 1, 3) 44s 42> Ty 4, 3)y 425 3% De 4h 425. Bin 4 

f2iy 4. Vy, Shy Zier ay Se aby 13y ay 250 4s sa da 4S 2h ts 25 
133 25 45 Thy 035-45. dy 2hy W385 45 25 Dy 145 cy 29-3, tay Ab 3s 

{4, 2, 1, 3h, £4, 2; 3) Dy 445 3) 1, 235 44; 3, 2, 13 


If we would like all the permutations of 5 integers each of length 3 this can be done as 
follows: 


Permutations [{1, 2, 3, 4, 5}, 1{33] 
{{1, 2 3}, {1 25 43, {1, 2s baa {i1. 3:3 Di, {1:., 3\; 43, sui 3; Bhs 


{is 4, 2), 41, 45 345 (1, 45 Ska thy 55 2h 21 Sy 3h, fy 55. 4, 
{2;-.1,- 3k; 425. ly 4¥s {25 a Sy £25 Sy The 42% 35. 4hy 425. 3y 5F 
{2, 4, 1}, {2, 4, 3}, {2, 4, 5}, {2, 5, 1}, {2, 5, 3}, {2, 5, 4}, 
{3, 1; 2}, {35 15. 435 {35 1, 5dy. £35. 25. Uy 435 25 435. £35 2) 54; 
139.°45 Lhe d3y 4y fie 134 45 Shy 4385 Se dhs tse Sy 2hy 4385 544s 
{4, 1, 2}, {4, 1, 3}, {4, 1, 5}, {4, 2, 1}, {4, 2, 3}, {4, 2, 5}, 
{4, 3, 13, {4, 3, 2}, {4, 3, 5}, {4, 5, 1}, {4, 5, 2}, {4, 5, 3}, 
{5 Ly 23; {5y dso 3}, P55 ay 4 - $54 25 1 45) 25 334 £55 25 4H 
£5, 3, 13}, £5, 3, 2}, £5, 3, 43, £5, 4, 1}, £5, 4, 2}, {5, 4, 3}} 
Length[%] 

60 
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In[26]= 
In[27]= 


Out[27]= 


In[28]= 


In[29]= 


Out [29]= 
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This verifies that this list of permutations should be of length 5 « 4 « 3 = 60. 


Example 1.7.5 Race Cars 


Section 1.7 also shows graphs of the probability distribution of the median in the race car 
example. They were drawn this way: 


med1[k_] := (k - 1) * (10 - k) / Binomial [10, 3] 

ListPlot [Table [medl [k], {k, 2, 9}], Frame — True, 

FrameLabel — {"Median", "Probability"}, PlotLabel — ("Race Car Problem"), 
LabelStyle > (FontFamily > "Helvetica-Bold")] 


Race car problem 
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med[k ] := Binomial[1, 1] * Binomial[k - 1, 4] * Binomial[100 - k, 4] / 
Binomial [100, 9] 

ListPlot[Table [med[k], {k,5, 96}], Frame — True, 

FrameLabel — {"Median", "Probability"}, PlotLabel — ("Race Car Problem"), 
LabelStyle — (FontFamily — "Helvetica-Bold")] 


Race car problem 
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CHAPTER TWO 


Many discrete probability distributions, including all those in Chapter Two, are contained 
in Mathematica as defined functions. We give some examples of the use of these functions, 
of drawing graphs, and of drawing random samples from these distributions. 


Section 2.1 Random Variables 


In[30]= 
Out [30]= 


In[31]= 
Out[31]= 


In[32]= 


Out [32]= 


Example 2.1.1 


We show a random sample of 100 selected from the discrete uniform distribution P (X = 
x)=1/n, x =1, 2, 3, ... n. The starting and ending values for x must be specified. We 
take the range from | to 6 to simulate tosses of a fair die. 


data = Table[Random[DiscreteUniformDistribution[{1, 6}]], {100}] 
25 45 1 


The data can then be organized by counting the frequency with which each integer occurs. 


freq = BinCounts[data, {1, 7, 1}] 
{14, 18, 15, 23, 15, 15} 


Now we can draw a histogram of the data: 


BarChart[{13, 20, 18, 18, 13, 18}, AxesLabel — {"Face", "Frequency"}, 
ChartLabels — {1, 2, 3, 4, 5, 6}] 


Frequency 
20 | 
15 | 


10 | 


Face 


Example 2.1.2 


Sampling from the die loaded so the probability that a face appears is proportional to the 
face is a bit more complex than sampling from a fair die. We sample from a discrete uniform 
distribution with values from | to 21; the value 1 becomes a one on the die; the next two 
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values, namely 2 or 3, become a 2 on the loaded die; the next three values are 4, 5, or 6 and 
they become a 3 on the loaded die, and so on. 


In[33]= datal = RandomVariate[ DiscreteUniformDistribution[{1, 21}], 200]; 


The semi-colon depresses printing the output. 


In[34]= freql = BinCounts [datal, {1, 22, 1}] 
Out[34]= {11, 14, 11, 8, 8, 10, 13, 16, 9, 10, 9, 8, 15, 5, 7, 10, 7, 5, 7, 7, 10} 


Now we collate the data: 


In[35]= orgdata = Table [Take [freq!, {(1/2) * (2 - m+ mA2), m * (m+ 1) / 23], 
fm, 1, 63] 
Out[35]= {{11}, {14, 11}, {8, 8, 10}, {13, 16, 9, 103, {9, 8, 15, 5, 7}, {10, 7, 5, 
7, 7, 10}} 
In[36]= orgfreq = Apply[Plus, orgdata, {1}] 
Out[36]= {11, 25, 26, 48, 44, 46} 
In[37]= BarChart[{1l, 25, 26, 48, 44, 46}, 
AxesLabel — {"Face", "Frequency"}, ChartLabels > {1, 2, 3, 4, 5, 6}] 


Out[37]= Frequency 


Face 


Example 2.1.3 
The probability distribution when two dice are thrown is 
P(X =x) = P(X = 14-x) = («— 1)/36 for x= 1, 2, 3, 4, 5, 6, 7. 


A graph of this function is shown in Figure 2.2 and can be generated by the following 
commands 


In[38]= probl = Table [(x - 1) / 36, {x, 2,7}] 


1 1 1 15 1 
Out [38]= { 36° 18? 12’ 9° 36° i} 
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In[39]= prob2 = Table[(14 - x) / 36, { x, 9, 13}] 


5 1 1 1 1 
Out [39]= { 36° 9° 12’ 18° x} 


In[40]= ts = Table[{i, i + 1}, {i, 1, 11}] 

Out[40]= {{1, 2}, {2, 3}, {3, 43, £4, 5}, {5, 6}, {6, 7}, {7, 8}, {8, 93, {9, 10}, 
{10, 11}, {11, 12}} 

In[41]= ListPlot[Flatten [{probl, prob2}, 1], Frame — True, AxesOrigin — {0, O}, 
FrameLabel > {"Sum", "Probability"}, PlotRange -> {0, 0.20}, 
FrameTicks > {ts, Automatic}, LabelStyle — (FontFamily — "Helvetica-Bold")] 


Out [41]= T T T <a T T 


Probability 


Sum 


Another way to do this is to use a generating function: 


In[42]= g[t_]:= Sum[(1 / 6) tAi, {i, 1, 6}] 
g[t] 


Pot 


2 
+o+ 54545 


t 
Out [42]= SrEererteEete 


t 
6 + 


The coefficients of g[t] give the probability that the die shows a particular face. The coeffi- 


cients of g[t]2 give the probabilities of the sum on two dice: 


In[43]= Expand[g[t]A2] 
Out43]J= — +#— +t — Hote eet t+ot+— 


In[44]= ListPlot[Drop[CoefficientList[Expand[g[t]A2], t] , 2], 
Frame > True, AxesOrigin > {0, 0} , FrameLabel — {"Sum", 
"Probability"}, PlotLabel — "Sums on Two Fair Dice", PlotRange — 
{0, 0.18}, LabelStyle = (FontFamily — "Helvetica-Bold")] 
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Out [44]= Sums on two fair dice 
T_T T_T Tt 
e 
0.157 4 
e e 
2 e e 
= 0.10} ‘| 
s 
° e e 
ao 
0.05 - bd e | 
e e 
0.00 Lu Paar 1 1 . 1 aon 
0 2 4 6 8 10 
Sum 
Figure 2.2 


The coefficients of g[t]43 give the probabilities of sums on three dice; here is a graph of 
the result: 


In[45]= ListPlot[Drop[CoefficientList[Expand[g[t]43], t], 3], 
Frame > True, AxesOrigin > {1, 0}, FrameLabel — {"Sum", 
"Probability"}, PlotLabel — "Sums on Three Fair Dice", PlotRange — 
{0, 0.13}, LabelStyle — (FontFamily — "Helvetica-Bold")] 


Out [45]= Sums on three fair dice 
T T ° Py T T T 
e e 


0.10 f ‘ ‘ : 
0.08 | j 


0.06 + j 


Probability 


0.04 f j 


0.02 + J 
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Sum 


0.00 
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Section 2.3 Expected Values of Discrete Random 
Variables 


To find the mean and variance of the sums on three dice we proceed as follows: 
In[46]= probsfor3 = Drop[CoefficientList[Expand[g[t]A3], t], 3] 


nee = 2] =e tL ee Se ee SS 
~ \ 216” 72° 36’ 108’ 72’ 72’ 216” 8’ 8” 216’ 72” 72’ 108” 36’ 72’ 216 
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In[47]= mean = Sum[i*probsfor3[[i - 2]], {i, 3, 18}] 


Out [47]= *: 


In[48]= variance = Sum[iA2*probsfor3[[i - 2]], {i, 3, 18}] - meanA2 


out[48]= - 


Section 2.4 Binomial Distribution 


We show how to take a random sample of 1000 observations from a binomial distribution 
with n = 20 and p = 3/4: 


In[49]= bindata = Table[Random[BinomialDistribution[20, 3 / 4]], {10}] 

Out[49]= {14, 18, 15, 12, 15, 15, 13, 16, 14, 13} 

In[50]= Histogram[bindata, 6, AxesLabel — {"Sum", "Probability"}, LabelStyle — 
(FontFamily — "Helvetica-Bold")] 


Out[50]= Probability 


3.0 f 


2.5 


Figure 2.9 can be produced as follows: 


In[51]= newxs = Table[{i, 33 + i}, {i, 1, 40, 5}] 


Out[51]= {{1, 34}, (6, 39}, {11, 44}, (16, 49}, (21, 54}, (26, 59}, (31, 64}, (36, 69}} 


In[52]= ListPlot[Table[PDF[BinomialDistribution[100, 1/2], x], {x, 34, 64}], 
Frame > True, AxesOrigin — {0, 0}, FrameLabel — {"X", 
"Probability"}, PlotLabel — "Binomial Distribution,n=100, p=1/2", 
LabelStyle — (FontFamily > "Helvetica-Bold") , FrameTicks > {{None, 
None}, {newxs, Automatic}}] 


www.it-ebooks.info 


Out [52]= 


In[53]= 
In[54]= 
Out [54]= 
In[55]= 
Out [55]= 
In[56]= 


Out [56]= 
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Binomial distribution, n=100, p=1/2 
T T T if if T T 
e%e 
e e 
e e 
e e 
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= e e 
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a e e 
© 
ao e e 
e e 
e e 
Pd *é 
@ e e bd 1 1 1 1 1 i i ® 
34 39 44 49 54 59 64 
Xx 
Example Flipping a Loaded Coin 


Here we simulate 1000 flips of a coin loaded to come up heads with probability 2/5. 


biasdata = Table[Random[BinomialDistribution[1, 2/5]] , {1000}]; 
biasfreq = BinCounts[biasdata, {0, 2, 1}] 

{602, 398} 

biasvalues = {0, 1} 

{0, 1} 


BarChart[Transpose[{biasfreq, biasvalues}], 
AxesLabel — {"Face", "Frequency"}, LabelStyle — (FontFamily > 
"Helvetica-Bold")] 


Frequency 


Face 
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In[57]= 
Out [57]= 
In[58]= 


The bars here represent heads and tails. 
Mathematica knows the mean and variance of the binomial distribution (as well as those 
moments for many other probability distributions): 


Mean[BinomialDistribution[n, p] ] 
np 
Variance[BinomialDistribution[n, p] ] 


Out[58J=n (1 —p)p 


Section 2.6 Some Statistical Considerations 


In[59]= 


Out [59]= 


In[60]= 


Out [60]= 


In[61]= 


Out [61]= 


In[62]= 
Out [62]= 
In[63]= 


Out [63]= 


In[64]= 
In[65]= 


The confidence intervals in Figure 2.11 were generated and plotted as follows. 


soln = Simplify[Expand[Solve[100 p = X + 2 * Sqrt[100 p (1 - p)], pJJ] 


{fr (10+5x-v/100+100x—x ) \ {p Se (10+5x +100 + 100x—x? ) \\ 


520 520 


leftend = p/. soln[[1]] 


1 
— (10+ 5X — V100 + 100X — X? 
500° Vv 100 + ) 


rightend = p/. soln[[2]] 


I 
—(10 + 5X + V100 + 100X — X? 
p00 7 ) 


chartdata = {40, 44, 29, 43, 43, 42, 39, 40, 43, 42, 36, 44, 35, 39, 42} 
{40, 44, 29, 43, 43, 42, 39, 40, 43, 42, 36, 44, 35, 39, 42} 

endpts = 

Table[{leftend, rightend}/. X > chartdata[[iJ], {i, 1, 
Length[chartdata]}]//N 

{{0.307692, 0.5}, {0.344931, 0.539685}, {0.208721, 0.387433}, {0.335563, 
0.529822}, {0.335563, 0.529822}, {0.326233, 0.519921}, {0.298482, 
0.48998}, {0.307692, 0.5}, {0.335563, 0.529822}, {0.326233, 0.519921}, 
{0.271095, 0.459674}, {0.344931, 0.539685}, {0.26205, 0.449488}, 
{0.298482, 0.48998}, {0.326233, 0.519921}} 

vert = Show[Graphics[Line[{{13.3, O}, {13.3, 16}}]]]; 
Show[Graphics[Table[Line[{{10*endpts[LiJJ[L 1]], i}, 
{50*endpts[[iJ][([2]],i}}], {i, 1, Length[endpts]}]], vert, Frame — True, 
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In[66]= FrameTicks > {{{2, 0.25}, {5.8, 0.3}, {9.6, 0.35}, {13.3, 0.40}, 
{17.1, 0.45}, {20.9, 0.50}, {24.7, 0.553}, [0, 2, 4, 6, 8, 10, 12, 14}}] 


Out [66]= 0.25 0.3 0.35 0.4 0.45 0.5 0.55 
144 414 
12/ 412 
104 410 
8h 48 
6+ 46 
44 44 
2b 42 
Ob +0 
0.25 0.3 0.35 0.4 0.45 0.5 0.55 


Section 2.7 Hypothesis Tests 


The alpha and beta errors in this section are sums of binomial probabilities. 


In[67]= alpha = Sum[PDF[BinomialDistribution[20, 0.2], x], { x, 9, 20}] 
Out [67]= 0.00998179 

In[68]= beta = Sum [PDF[BinomialDistribution[20, 0.3], x], {x, 0, 8}] 
Out [68]= 0.886669 


Section 2.9 Geometric and Negative Binomial 
Distributions 


We show how to draw Figure 2.13 


In[69]= Negbin[x_, r_]: = Binomial[x - 1, r - 1] * (C1 / 2) 4r) * (C1 / 2) 4 
@ - r)) 

In[70]= ListPlot[Table[Negbin[x, 5], {x, 5, 25}], AxesLabel — {"x", "Probabil- 
ity"}, PlotRange — {0, 0.14}, PlotLabel -> "Negative Binomial 
Distribution, r=5, p=1/2", AxesOrigin — {0,0}, LabelStyle — (FontFam- 
ily — "Helvetica-Bold"), Ticks -> {{{1, 5}, {5, 10}, {10, 15}, {15, 20}, 
{20, 25}, {25, 30}}, Automatic}] 
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Out[70]= Probability 
0.145 ee 


0.125 ~ ® 

0.10 © 

0.08 ' e e 

0.06 L e 
0.04 L - 


0.02 F e 


Section 2.10 Hypergeometric Distribution 


Section 2.10.3 


Figure 2.18 shows a hypergeometric distribution with n = 30, D = 400, and N = 1000. This 
distribution can be plotted with the following commands. 


In[{71]= hyperfcn = PDF[HypergeometricDistribution[30, 400, 1000], x] 


2 429 608 192 173 745 103 270 389 838 576 750 719 302 222 606 198 631 438 800 
0) True 


Binomial [400,x]Binomi a1 [600,30-x] 
Out [71]= 


Now we generate a table of values of the function and then plot these values. 


In[72]= ListPlot[Table[hyperfcn, {x, 0, 24}], Frame > True, FrameLabel > {"x", 
"Probability"}, PlotRange — {0, 0.155} PlotLabel — "Hypergeometric Dis- 
tribution, N=1000, n=30, D=400", AxesOrigin — {0,0}, LabelStyle — (Font- 
Family > "Helvetica-Bold")] 


Out [72]= Hypergeometric distribution, N=1000, n=30, D=400 


e 
0.14 e 6 ] 


0.12 f | 
0.10 F | 
0.08 F e ‘ : 
0.06 F j 


Probability 


0.04 + 4 
0.02 + j 


0.00 «e222? .. 1 . 1 Paaee ce! 


www.it-ebooks.info 


Appendix A Use of Mathematica in Probability and Statistics 405 


Section 2.11 Acceptance Sampling 


In[73]= 
Out[73]= 


In[74]= 


In[75]= 
Out [75]= 
In[76]= 


In[77]= 


In[78]= 


We discussed a double sampling plan in this section. Here are the graphs showing 
the probability the lot is accepted (Figure 2.23) and the average outgoing quality 
(Figure 2.24). 


P(Accept) 


0.8 + % 
0.6 + % 
o4t % 


0.2 + 


probacc = Sum[Binomial[40, x] * Binomial[460, 50 - x]/Binomial[500, 50], 
{x, 0, 3}] + Sum[(Binomial[40, x] * Binomial [460, 50 - x]) * Binomial [410 
+ x, 30]/(Binomial[500, 50] * Binomial[450, 30]), {x, 4, 5}]//N 
0.445334 

pacc[y_]:= 

Sum [Binomial [y, x] * Binomial [500 - y, 50 - x] / Binomial [500, 50], 
{x, 0, 3}] + Sum [(Binomial [y, x] * Binomial [500 - y, 50 - x]) * Bino- 
mial [450 - y + x, 30] / (Binomial [500, 50] * Binomial [ 450, 30]), 

{x, 4, 5}] 

pacc[40]//N 

0.445334 

ListPlot[Table[pacc[y], fy, 0, 70}], 

LabelStyle — (FontFamily > "Helvetica-Bold"), AxesLabel — {"D", 
"P(Accept)"}] 

aoqly_] : = 

Sum[(y - x) * Binomial[y, x] * Binomial[500 - y, 50 - x]/Binomial[500, 
50], {x, 0, 3}] + Sum[(y - x) * (Binomial[y, x] * Binomial[500 - y, 

50 - x]) * 

Binomial[450 - y + x, 30]/(Binomial[500, 50] * Binomial[ 450, 30]), 

{x, 4, 5}] 

ListPlot[Table[aoq[y], fy, 0, 100}], LabelStyle — (FontFamily — 
"Helvetica-Bold") , AxesLabel — {"D" , "AOQ"}] 
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Outl78]= AOQ 
20+ 


10 b 


“20. #440 #4260 80 100 


In[79]= Max[Table [aoq[y], fy, 0, 100}]] //N 

Out[79]= 19.3036 

In[80]= Table[{y, aoq[y]}, fy, 27, 31}] //N 

Out[80]= {{27., 19.1317}, {28., 19.2475}, {29., 19.3036}, {30., 19.3018}, 
{31., 19.2445}} 


Showing the maximum to be at D = 29. 


Section 2.13 Poisson Random Variable 


Here is the way Figure 2.26 was drawn. 


In[81]= ListPlot[Table[PDF[PoissonDistribution[3], x], {x, 0, 10}], PlotLa- 
bel — "Poisson Distribution with A=3", Frame — True, LabelStyle > (Font- 
Family — “Helvetica-Bold") ] 


Out [81]= Poisson distribution with A=3 
T T e e T T T 1 
0.20 - ; 
e ] 
0.154 e 1 
0.10} e J 
005+ ® ; 
r 
0.00 f, ae _.°% ee 
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CHAPTER THREE 


The standard probability density functions, such as the uniform, exponential, normal, 
chi-squared, gamma, Weibull distributions (as well as many, many others) are included in 
Mathematica. Some examples of their use is given here. 


Section 3.1 Continuous Random Variables 


Means and variances for continuous distributions are found directly by integration. Here 
we use the probability density function 3x“2 for x in the interval (0,1). 


Example 3.1.1 


In[82]= mean = Integrate[x * 3x‘2, {x, 0, 1}] 


Out [82]= 3 
In[83]= variance = Integrate[(xA2) * 3xA2, {x, 0, 1}] - meanA2 
out[83]= = 


Section 3.3 Exponential Distribution 


The probability density function for the exponential distribution can be found as follows. 
The value for A must be specified. 


In[84]= expdist = PDF[ExponentialDistribution[A], x] 


-xA 
outcs4]= 2 ° nO 
0 True 


Note that the mean is then 1/A. Probabilities are then found by integration. 


Example 3.3.2 


The probability that X exceeds 2 is given by the following. 


In[85]= Integrate[PDF[ExponentialDistribution[1], x], {x, 2, Infinity}] 


Out [85]= 2 


e2 


Section 3.5 Normal Distribution 


The mean and standard deviation must be specified to determine the normal distribution. 
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In[86]= normdisdt = PDF[NormalDistribution[a, b], x] 


_ (-a+x)? 
e 2b? 
Out[86]= ————. 


bV2x 


Example 3.5.1 


Here we seek the conditional probability a score exceeds 600, given that it exceeds 500. 
There is no need to transform the scores or to consult a table. 


In[87]= satdist = PDF[NormalDistribution[500, 100], x]; 
In[88]= Integrate[satdist, {x, 600, Infinity}]/Integrate[satdist, {x, 500, Infin- 


ity}]//N 
Out[88]= 0.317311 


Section 3.6 Normal Approximation to the Binomial 


Direct comparisons with exact binomial probabilities and normal approximations are easily 
done. We use a binomial distribution with n = 500 and p = 0.10. To compare the exact value 
of P(X = 53) with the normal curve we calculate as follows: 


In[89]= PDF[BinomialDistribution[500, 0.10], 53] 
Out[89]= 0.0524484 
In[90]= Integrate[ 
PDF[NormalDistribution[500 * 0.10, Sqrt[500 * 0.10 * 0.90]], x], {x, 52.5, 
53.5}] 
Out[90]= 0.0537716 


Section 3.7 Gamma and Chi-Squared Distributions 


We show here the syntax for calling gamma and chi-squared distributions and we show two 
graphs. The number of degrees of freedom must be specified for the chi-squared distribution 
while the gamma distribution is characterized by two parameters, r and A. 


In[91]= gamdist = PDF[GammaDistribution[r, 1 /A] , x] 


-r 
-XAy-14r (1 
conk x) 
Gamma[r] 
0 True 


Out [91]= x >0 


Here r and A are the parameters used. In the following graph, r = 7 and A = 4/7. 

In[92]= Plot[PDF[GammaDistribution[7, 4/7], x], {x, 0, 9}, FrameLabel > {"x", 
"fF", Frame > True, PlotLabel — "Gamma Distribution with r = 7 and 
A = 4/7", LabelStyle > (FontFamily > "Helvetica-Bold")] 


www.it-ebooks.info 


Appendix A Use of Mathematica in Probability and Statistics 409 


Out [92]= Gamma distribution with r = 7 and A = 4/7 


0.25 f | 
0.20 | 
0.15 5 ; 
0.10 1 
0.05 f ; 


0.00 + j 


Here is a chi-squared distribution as shown in Figure 3.17. 


In[93]= Plot[PDF[ChiSquareDistribution[6], x], {x, 0, 16}, FrameLabel — {"x", 
"f"}, Frame > True, PlotLabel > "A Chi-Squared Distribution", Label- 
Style — (FontFamily — "Helvetica-Bold")] 


Out [93]= A Chi-squared distribution 
0.146 Ty 


0.12 j 
0.10 j 
0.08 J 
0.06 j 
0.04 j 


0.02 J 


0.00 j 


Section 3.7 Weibull Distribution 
Parameters a and £ must be specified. Mathematica returns the probability density function. 

In[94]= weibdist = PDF[Weibul1Distribution[a, b], x] 

a 

Xx 

0b) ( 

5 x>0 
0 True 


a 


Out [94]= 


Here are some graphs of Weibull Distributions: 
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In[95]= wplot = Plot [Evaluate@Table[{PDF[Weibul1IDistribution[2 , 3], x], 
PDF[Weibul1Distribution[1 / 4, 1 / 2], x], PDF[Weibul1Distribution[3, 6], 
xJ}], {x, 0, 12}, Frame — True, FrameStyle > (FontFamily — 
"Helvetica-Bold")]; 

In[(96]= horizl = Graphics[Text["<--- {a=1/4, B=1/2}", {1.7412,0.2974}]]; 

In[97]= horiz2 = Graphics[Text["<--- {a=2,f6=3}", {4.311,0.2418}]]; 

In[98]= horiz3 = Graphics [Text ["<--- f{a=3, B=6}", {8.166,0.1512}]]; 

In[99]= Show[wplot, horizl, horiz2, horiz3] 


Out [99]= Show[wplot, horiz1, horiz2, horiz3] 


0.4 


0.3/ |e {a=1/4,p=1/2} 
0.2 
<--- {a=3,$=6} 


0.11 


0.0 - 


CHAPTER FOUR 
Section 4.5 Generating Functions 


Figure 4.5 shows the probability distribution of the sum when 12 fair dice are thrown. The 
graph was done using g[f], a probability generating function., which we used in Chapter 
Two. 


In[100]= g[t_] := Sum[(1 / 6) t‘i, {i, 1, 6}] 

In[101]= ListPlot[Drop[CoefficientList[Expand[g[t]‘12], t], 12], Frame — True, Axe- 
sOrigin — {1, 0}, FrameLabel — {"Sum", "Probability"}, PlotLabel — "Sums 
on Twelve Fair Dice", PlotRange — {0, 0.08}, LabelStyle — (FontFamily — 
"Helvetica-Bold")] 


Out [101]= Sums on twelve fair dice 
0.08 1 r —— 1 r r 
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Out [102]= 
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We also show a graph of the sums when 12 uniform random variables are added. We first 
need the generating function. 


ListPlot[CoefficientList[((l + t) / 2) A12, t], Frame — True, Label- 
Style — (FontFamily > "Helvetica-Bold"), FrameLabel — {"Sum", "Probabil- 
ity"} PlotRange — {0, 0.25}, PlotLabel — "Sums on 12 Uniform Random Vari- 
ables" ] 


Sums on 12 uniform random variables 
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Section 4.8 Moment Generating Functions 


In[103]= 


Out [103]= 


In[104]= 


In[105]= 
Out [105]= 


In[106]= 


Out [106]= 


In[107]= 


Out [107]= 


Moment Generating Functions can be found. We give an example of each. 
The probability generating function for the binomial distribution is 


FunctionExpand[Sum[(t Ax) * Binomial [n, x] * (p Ax) * (1 - p) A (n - x), 
{x, 0, n¥]] 

(1l+p (-1+t))" 

This can also be written as 


gen[t_] : = (q+ p* t)" 


and we find the expected value as the first derivative of the probability generating function 
when ¢t = |: 


gen’[t] /.to-1l 
np (p + q)**" 


The moment generating function for the exponential distribution is 
expmomefcn = Integrate[Exp[tx] * Exp[-x], {x, 0, Infinity}] 
Conditional Expression [= Re[t] < 1 

and this can be expressed as a power series: 

Series[(1 - t)-“!, {t, 0, 10}] 


14e¢egegete er 40g 4 84 8 4c! 4 ope"! 
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Section 4.10 Sums of Random Variables Il 


In[108]= 


Out [108]= 


In[109]= 


Out [109]= 


In[110]= 
In[111]= 


Example 4.10.2 Sums of Exponential Random Variables 


Here a limit must be calculated in order to show that the sum of exponential random vari- 
ables becomes normal. We show the limit of log [M[Z: f]] as n becomes infinite. 


Imz = (-t * Sqrt[n]) - n * Log[1 - t /Sqrt[n]] 


sol 


n 


Limit[1Imz, n > Infinity] 


S| 


Here is a graph of the sum of three independent exponential random variables. The proba- 
bility density function is f[x] = (1 /2)x?e™, x >= 0: 


f2[x_] := (1/2) x*Exp[-x] 

Plot(f2[x], {x, 0, 15}, Frame — True, LabelStyle — (FontFamily —> 
"Helvetica-Bold"), FrameLabel — {"Sum", "Probability"}, PlotRange — 
{0, 0.28}, PlotLabel — "Sums on Three Exponential Random Variables"] 


Sums of three exponential random variables 
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Section 4.11 The Central Limit Theorem 


We discussed sampling from the uniform distribution in this appendix in Chapter Two. 
Here we want samples of size 3 from a uniform distribution on the integers from | to 20. 
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In[112]= 
In[113]= 
Out [113]= 


In[114]= 
Out [114]= 


In[115]= 
Out[115]= 
In[116]= 
Out [116]= 
In[117]= 
Out [117]= 


In[118]= 


Out [118]= 
In[119]= 


In[120]= 
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We show how to draw 100 such samples and compute the sample mean of each one. We 
show a histogram of the sample means as an illustration of the central limit theorem. 


Table[RandomInteger[{1, 20}, 3], {i, 1, 1003]; 

totals = Apply[Plus, %, 1] 

{44, 46, 33, 32, 52, 36, 39, 28, 31, 48, 25, 24, 38, 34, 28, 27, 47, 13, 
36, 25, 41, 39, 27, 33, 52, 25, 44, 39, 31, 17, 41, 13, 26, 19, 16, 34, 
50, 36, 29, 34, 14, 40, 17, 46, 24, 22, 45, 17, 39, 10, 24, 33, 30, 37, 
21, 30, 46, 43, 35, 49, 21, 29, 15, 28, 54, 30, 23, 19, 34, 37, 29, 21, 
24, 29, 37, 30, 21, 17, 26, 45, 49, 29, 40, 24, 37, 20, 29, 23, 22, 53, 
38, 16, 44, 52, 27, 40, 23, 35, 39, 30} 

Length[totals] 

100 


Max [means] 

18 

Min[means] 

10 

3 

freqs = BinCounts[means, {11/3, 59/3, 1/3}] 

{0, 0, 2, 1, 1, 2, 4, 0, 2, 1, 4, 2, 3, 5, 3, 2, 3, 3, 6, 5, 2, 1, 34.4, 
2, 3, 4, 2, 5, 3, 2, 0, 1, 3, 2, 3, 1, 1, 2, 1, 0, 3, 1, 1, 0, 0, 0, 0} 
Length[freqs] 

48 

Histogram[freqs, 6, AxesLabel — {"Sum", "Probability"}, LabelStyle — 
(FontFamily > "Helvetica-Bold")] 


Probability 
12 


10 


8 


Sum 


www.it-ebooks.info 


414 Appendix A Use of Mathematica in Probability and Statistics 


Section 4.12 Weak Law of Large Numbers 


In[121]= 
Out [121]= 


In[122]= 


Out [122]= 


In[123]= 


The following commands show the construction of Figure 4.10 


values = Table[RandomVariate[BinomialDistribution[100, 0.38], {100}]] 


{32, 40, 42, 38, 35, 31, 41, 37, 36, 43, 33, 43, 37, 39, 27, 33, 45, 28, 
38, 29, 38, 37, 31, 40, 44, 32, 41, 43, 44, 33, 45, 38, 43, 34, 35, 37, 
36, 30, 39, 40, 39, 39, 37, 29, 40, 30, 30, 42, 37, 46, 37, 39, 38, 48, 
38, 48, 33, 42, 44, 39, 21, 31, 37, 42, 21, 43, 36, 38, 

46, 34, 41, 43, 41, 44, 47, 42, 33, 37, 51, 41, 42, 42, 36, 35, 40, 

32, 37, 41, 34, 34, 36, 45, 37, 40, 37, 38, 32, 31, 42, 48} 


Now we show how the partial sums of these values produce a mean value that approaches 
0.38. 


newvalues = Table[Sum[values[[i]], {i, 1, j}]/(100 * j), {j, 1, 100}] 


(23 ea ee 
25’ 25’ 50’ 50’ 500’ 300’ 100’ 100’ 225’ 8’ 275’ 1200’ 325’ 1400’ 750’ 
158 11 349 727 51 401 833 291 917 73 11 1033 


587 


1077 37 231 1193 103 127 261 671 689 176 1447 1487 


763 313 801 1631 557 1701 1731 591 181 232 631 


2050’ 840’ 2150’ 4400’ 1500’ 4600’ 4700’ 1600’ 490’ 625” 1700’ 1300’ 
197 1009 514 263 2137 2179 2223 377 2283 1157 2351 2393 


1207 819 2493 2531 859 373 663 539 684 139 2827 151 
3250’ 2200’ 6700’ 6800’ 2300’ 1000’ 1775’ 1440’ 1825’ 370’ 7500’ 400’ 
1451 2939 299 3031 3073 623 3151 531 1613 1629 659 417 
3850’ 7800’ 790’ 8000’ 8100’ 1640’ 8300’ 1400’ 4250’ 4300’ 1740’ 1100’ 


337 851 172 697 587 1781 3599 3637 3669 37 1871 = 


890’ 2250’ 455’ 1840’ 1550’ 4700’ 9500’ 9600’ 9700’ 98° 4950’ 1000 


Now 379/1000 is very close to 0.38 and the graph of these means is interesting: 


ListPlot[newvalues, AxesOrigin — {0, 0.35}, Frame — True, FrameLa- 
bel — {"Number of Trials", "Probability of Success"}, PlotLabel — "Suc- 
cessive Mean Values", LabelStyle — (FontFamily — "Helvetica-Bold")] 
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Out [123]= 
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Successive mean values 
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Section 4.14 Distribution of the Sample Variance 


In[124]= 
In[125]= 
In[126]= 
In[127]= 
In[128]= 
In[129]= 
In[130]= 
Out[130]= 
In[131]= 


Out [131]= 


Figure 4.11 shows the variances of samples of size 3 drawn without replacement from a 
uniform distribution on the set {1, 2, ..., 20}. The graph is drawn as follows. 


vardata = Flatten[Table[{i, j, k}, fi, 1, 20}, {j, 1, 203}, {k, 1, 203], 2]; 
variances = Table[Variance[vardata[[i]J]], {i, 1, Length[ vardata]}]; 
sortvariances = Sort[variances]; 

valuesvars = Union[sortvariances]; 

freqvars = Table[Count[sortvariances, valuesvars[[iJ]], {i, 1, 
Length[valuesvars]}]; 

ticks = {{20, 20}, {40, 403}, {60, 60}, {80, 80}, {100, 100}} 

{{20, 20}, {40, 403, {60, 60}, {80, 80}, {100, 100}} 
ListPlot[Transpose[{valuesvars, freqvars}], Frame — True, FrameLabel — 
{"Variance", "Frequency"}, PlotRange — {0, Max[freqvars]}, PlotLabel — 
"Sample Variances", Ticks — {ticks, Automatic} , LabelStyle — (FontFam- 
ily — "Helvetica-Bold")] 
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A probability density function suggested by Figure 4.11 


In[132]= Plot[1l / x, {x, 0, 4}, Ticks > {{0, Pi}, Automatic}] 


Out [132]= [ 
[132]= 4 ot 


Section 4.14 Hypothesis Tests and Confidence Intervals 
for a Single Mean 


Example 4.14.1 


If the mean and standard deviation are known, which is the case in this example, a confi- 
dence can be found using the following. 


In[133]= Needs["HypothesisTesting™ "] 


A 95% confidence interval for the population mean: 


In[134]= NormalCI[2200, 4591/5]//N 
Out[134]= {400.361, 3999.64} 


Note that we used the standard deviation of the mean based on a sample of size 25. The 
default confidence level is 95 %, but this can be changed to any probability. 


In[135]= NormalCI[2200, 4591/5, ConfidenceLevel — 0.8436]//N 
Out[135]= {898.65, 3501.35} 


Student t Distribution 


The Student f distribution is included in Mathematica. Here is the Student ¢ distribution 
with 5 degrees of freedom 


In[136]= tdist = PDF[StudentTDistribution[5], x] 


2001/5 


Out [136]= 
Ue Se eae 
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In[137]= 


Out [137]= 


In[138]= 
Out [138]= 
In[139]= 


Out [139]= 
In[140]= 
Out [140]= 


In[141]= 
Out [141]= 
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Note that Mathematica returns the probability density function. Now here are some graphs 
of the Student ¢ distribution. 


Plot[Evaluate@Table[{PDF[StudentTDistribution[2], x], 
PDF[StudentTDistribution[10], x], PDF[StudentTDistribution[50], x]}], 
{x, -3, 3}, Frame — True, FrameStyle > (FontFamily > "Helvetica-Bold")] 


G4 iy ot a ee ee eel 


0.34 ] 
0.21 ] 


0.1 b | 


The Student ¢ distributions in Figure 4.15 are for 5, 20, and 40 degrees of freedom. 
Confidence Intervals can be calculated. 


Example 4.1.4 
Suppose the data are as follows: 


tdata = {8, 7, 3, 5, 9, 4, 10, 2, 6, 7} 
{8, 7, 3:5 5, 9. 4, 10, 2, 6, 7} 
Variance[tdata] 

203 

30° 

Mean[tdata] 

61 

10 

StudentTCI[61/10, Sqrt[(203/30)/10] ,9]//N 
{4.23916, 7.96084} 


Section 4.15 Tests on Two Samples 


In[142]= 
Out [142]= 
In[143]= 


Mathematica can do hypothesis testing and find confidence intervals directly from sample 
data. As an example, suppose the following are samples from two normal populations. 


datal = {7, 4, 5, 6, 7, 3, 4, 2, 5, 8, 9, 12} 
{7, 4, 5, 6, 7, 3, 4, 2, 5, 8, 9, 12} 

data2 = {8, 9, 5, 6, 7, 8, 3, 4, 5, 6, 3, 4, 
{8, 9, 5, 6, 7, 8, 3, 4, 5, 6, 3, 4, 5, 7, 6, 


5, 7, 6, 3, 4, 9} 
3; 4, 9} 


Note that the sample sizes are not the same. A 95 % confidence interval for the difference 


between the means can be found. We can assume the population variances are equal or not. 
The default assumption is that the variances are unequal. 
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In[144]= 
Out [144]= 


In[145]= 
Out [145]= 


In[146]= 
Out [146]= 


In[147]= 
Out [147]= 


In[148]= 


In[149]= 
In[150]= 
In[151]= 
In[152]= 


Out [152]= 


MeanDifferenceCI[datal, data2, EqualVariances — True] 
{-1.45699, 2.12366} 


Now assume the variances are not equal. 


MeanDifferenceCI[datal, data2] 
{-1.62744, 2.2941} 


These give quite different confidence intervals. 

We can also find a confidence interval for the ratio of the variances. We must specify the 
sample ratio of the variances and then the number of degrees of freedom for both the numer- 
ator and the denominator. 


FRatioCI[Variance[datal]/Variance[data2], 11, 17] 
{0.681112, 6.41411} 


Since | is in this interval, we can presume the true variances are equal. 

p values are given for hypothesis tests. We must specify the data and the difference between 
the means we wish to test. We test whether or not the means differ by 1. Again, the true 
variances are presumed to be unequal. 


LocationTest[{datal, data2}, 1] 
0.451979 


LocationTest returns a p value of the null hypothesis that the true means are equal against 
the alternative that the true means are unequal. A small p value would indicate that it is 
unlikely that the null hypothesis is true, so here we would undoubtedly conclude that the 
true means are equal and do not differ by 1. 


F Distribution 
Here are graphs of some F distributions: 


fplots = Plot[Evaluate] 

Table[{PDF[FRatioDistribution[4, 10], x], PDF[FRatioDistribution[20, 20], 
x], PDF[FRatioDistribution[2, 5], x]}], {x, 0, 6}, Frame — True]; 

arrowl = Graphics[Text["<---------- F(2,5)", {0.9, 0.957 9}]]; 

arrow2 = Graphics[Text["<----- F(4,10)", {2.0213, 0.3355}]]; 

arrow3 = Graphics[Text["<----- F(20,20)", {1.997, 0.6252}]]; 

Show[fplots, arrowl, arrow2, arrow3] 
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Section 4.16 Least Squares Linear Regression 
We show the construction of Figure 4.18. 


In[153]= rdata = {{92, 104}, {86, 91}, {104, 123}, 
{{92, 104}, {86, 91}, {104, 123}, {109, 102}, 
{75, 86} , {100, 99} , {91, 92} , {110, 114} , {128, 99}} 

In[154]= scatter = ListPlot[rdata, Frame -> True, AxesOrigin — {70, 80}, 
PlotStyle — {PointSize[0.015]}, PlotLabel — "Scatter Plot for Data", 
FrameLabel — {"Math Score", "IQ"}, LabelStyle — (FontFamily —> 
"Helvetica-Bold")] 


Out [154]= Scatter plot for data 
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419 


To fit a straight line to the data we use the LinearModelFit command. The x’s indicate we 
want an intercept. Mathematica offers a wide range of functions that can be specified in 


addition to this simple linear function. 


In[155]= 1fit = LinearModelFit[rdata, x, x] 


Out [155]= 63.0792 + 0.382444 x 
FittedModel [ 


In[156]= 1fit["ANOVATable" J 


Out[156]= DF SS MS F-Statistic P-Value 
x 1 284.368 284.368 2.5117 0.157022 
Error 7 792.521 113.217 
Total 8 1076.89 


In[157]= 1fit["ParameterTable" ] 


Out [157]= | Estimate Standard Error t-Statistic P-Value 
1 63.0792 24.2581 2.60034 0.0354077 
x 0.382444 0.241314 1.58484 0.157022 
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There are many other options available in addition to the analysis of variance and the param- 
eter table. 


It is always useful to see the fit visually. 


In[158]= t[x_] := 63.0792 + 0.38244 x 
In[159]= stline Plot[t[x], {x, 70, 1303]; 
In[160]= Show [scatter, stline] 


Out [160]= Scatter plot for data 
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e | 
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Section 4.17 Control Chart for X Bar 


Example 4.17.1 


We show a plot of a quality control chart as in Figure 4.20. 


In[161]= controldata = {{3, -1, 6, 4}, {9, 0, 3, -2}, {-12, 4, -9, -6}, 
{1l, 9, 4, 1}, {-1, -2, 4, -1}, {8, 1, -2, 3}, {-1, -4, -9, -3}, 
{-8, -3, -6, -4}, fl, -2, -2, 1}, {-2, -2, =35 -2}, {0, 4, 1, -2}} 
{{3, -1, 6, 4}, {9, 0, 3, -2}, {-12, 4, -9, -6}, {1l, 9, 4, 1}, 
{-1, -2, 4, -1}, {8, 1, 255 3}, {-1, -4, -9, -3}, 
{-8, -3, -6, -4}, fl, -2, -2, 1}, {-2, -2, =3; -2}, {0, 4, 1, -2}} 

In[162]= Length[controldata] 

Out[162]= 11 
In[163]= means = Table[Mean[controldata[[iJ]], {i, 1, Length[controldata]}] 


Out [163]= {3 9-32 02,7 522} 
2 4 4 2 4 4 2 44 


In[164]= ucl Graphics[Line[{{0, 5.0974}, {10.5, 5.0979}}]]; 
In[165]= 1cl Graphics[Line[{{0O, -6.188}, {10.5, -6.188}}]]; 
In[166]= Itext = Graphics[Text["LCL", {11.25, -6.2}]]; 
In[{167]= utext = Graphics[Text["UCL", {11.25, 5.0979}]]; 
In[168]= Show [ListPlot[means, AxesOrigin — {0, O}] , 
ucl, Icl, Itext, utext, AxesLabel — {"X", "Mean"}, 
LabelStyle > (FontFamily > "Helvetica-Bold"), PlotRange — {-6.2, 6.2}] 


www.it-ebooks.info 


Appendix A Use of Mathematica in Probability and Statistics 421 


Out[168]= Mean 
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CHAPTER FIVE 


Section 5.2 Joint and Marginal Distributions 


Example 5.2.1 


Probabilities for the joint distribution of X and Y are found by the following : 


In[169]= jprobs = 
Table[Binomial [5, x] * Binomial [5 - x, y] * (1 / 2) * (10 - x), {x,0,5}, 
fy,0,5 = x}] 


ortiea= {{ SS (Se as} 
1024’ 1024’ 512’ 512’ 1024’ 1024 J \ 512’ 128” 256” 128” 512 


{= aaa =} {> = >} {2 2} {=} 
128” 128° 128° 128)’ 64’ 32’ 64)’ 64’ 643° 132 
The marginal distribution for X can be found as follows: 


In[170]= Apply[Plus, jprobs, 1] 


Out[170]= {.3., om 25} 
32 32 16 16 32 32 


Example 5.2.2 


The joint probability distribution function is plotted as follows: 


In[171]= f[x_,y_] := x A 2+ (8/3) *x*®y 
In[172]= Plot3D[f[x, y], {x, 0, 1}, fy, 0, 1}, Viewpoint — {1.3, -1.8, 2}, AxesLa- 
bel = {"X", "Y" , "f"}, LabelStyle > (FontFamily — "Helvetica-Bold")] 
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Out [172]= 


1.07 0.0 


Marginal distributions are easily found. 


In[173]= mf[x_] := Integrate [f[x, y], fy, 0, 1}] 
In[174]= mf[x] 


Out [174]= — +x 


In[175]= mg[y_] := Integrate[f[x, y], {x, 0, 1}] 
In[176]= mgLy] 


1 4y 
Out[176J= — + = 
3 3 


Probabilities are volumes. 


In[177]= Integrate[f[x, y], {x, 1 / 2, 1}, fy, 0, 2 / 3}] 


5 
0 177]= — 
a 


Section 5.3 Conditional Distributions and Densities 


Conditional distributions are ratios of the distributions previously calculated. 


In[178]= fxgivy[x_] := f[x, yl] / mgLy] 
In[179]= fxgivy[x] 


Out[179]= x24 ay 
1, 4y 
s} = 3 


In[180]= fygivx[y_] := flx,y]/ mf[x] 
In[181]= fygivx[y] 
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Out[181]= 24 — 


Section 5.4 Expected Values 


Both conditional and unconditional expectations can now be found. 
In[182]= eygivx = Integrate[y * fygivx[y], fy, 0,1}] 


Out [182]= —————— + 


In[183]= FullSimplify[%] 


2 
12 + 9x 


1 
Out[183]J= = + 
ut[ ] 5 


The integral of E[Y|X = x] « f[x] gives the expected value of Y. 


In[184]= ey = Integrate [eygivx * f[x], {x, 0, 1}]//N 
Out[184]= 0.624065 


Section 5.6 Bivariate Normal Densities 


The bivariate normal density is defined as follows. 


In[185]J= k[r_] := 1 / (2 * Pi * Sqrt[1 - r42]) 
In[186]= bivnormf[x_, y_] : = k[r] * Exp[(-1 / (2€1 - r A2))) * CKA2 - 2% r * 
y + yA2)] 


Let the correlation coefficient (r) be 1/2: 


In[187]= 
Out [187]= 


=1/2 


Nie FT 


Here is a plot of the bivariate normal density with p = 1/2: 


In[188]= Plot3D[bivnormf[x, y], {x, -3, 3}, fy, -3, 3}, AxesLabel — {"x", "y", 
"f"}, PlotPoints > 40, LabelStyle — (FontFamily — "Helvetica-Bold")] 
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Out [188]= 


Here is a contour plot. 


In[189]= ContourPlot [bivnormf[x, y], {x, -3, 3}, fy, -3, 3}, PlotPoints —> 40] 


Out[189]J= 3 


11 


ol 
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CHAPTER SIX 


Recursions and Markov Chains are easily done in 
Mathematica. 


Section 6.2 Some Recursions and their Solutions 


Example 6.2.1 


We saw a recursion for the Fibonacci numbers previously in this Appendix in Chapter 2. 
We now define a new recursion and solve it. 


In[190]= b[n_] : = b[n] = p * b[n - 1] + (1 - p) * C1 - b[n - 1]) 
In[191]= b[1] = 0 
Out[191]=0 


In[192]= b[2] = 1 - p 
Out[192]= 1 - p 


Now we compute some values for b[n]. 


In[193]= b[1] 

Out [193]=0 

In[194]= b[2] 

Out[194]=1-—p 

In[195]= b[3] 

Out[195]J=2(1—p)p 

In[196]= Simplify [Expand [Table [b[n], {n, 2, 7}]]] 


{1-p,—2(—1+p)p, 1 — 3p + 6p’ — 4p”, 4p (1 — 3p + 4p” — 2p”), 
1—5p+20p* —40 p> + 40 p* — 16 p’, 2p (3 — 15 p+ 40 p? — 60 p® + 48 p* — 16 p>)} 
Now we look at a particular value, b[4], and expand it about 1/2: 


In[197]= exp = Normal [Series[b[4], {p, 1 / 2, 3}]] 


3 
Out[197]= =-4(—= + p) 
2 2 


This suggests a simple form for b[n] 
Mathematica will also solve the recursion; we choose the special case where p = 1/3. 


In[198]= RSolve[{c[n] == (1/3) * c[n - 1] + (2/3) * (1 - c[n - 1]), 
c[1] == 0, c[2] == 2/3} , c[n], n] 


Out [198]= { {etal > 3 (3 (-1)"+ any} 
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In[199]= Expand[%] 


Out [199]= { {etn = s+ 5-1 att 


In[200]= Limit E + *¢ i? 3°": n= Infinity | 
Out[200]= : 


This shows that c[n] > 1/2 as n becomes large. 


Section 6.3 Random Walk and Ruin 


The construction of Figure 6.6 is shown here. We first define the function a[g] as used in 
the text. We'll suppose that p= 0.49 and q=0.51. 
In[201]= 0.51 /0.49 //N 
Out [201]= 1.04082 
In[202]= a[g_, h_, p_] : 


= (1 - (1 - p) /p) 4g) /C1 - CCl - pd) /p) A Cg + DD 
In[203]= Plot3D[a[g, 30, p], {g, 20, 303, {p, 0.44, 0.49}, 
AxesLabel > {"g", "p", "Probability"}, PlotPoints — 40, LabelStyle — 
(FontFamily > "Helvetica-Bold")] 


Out [203]= 


probability 


Section 6.5 Markov Chains 


Example 6.5.1 


The matrix in this example is entered as follows. 
In[204]= m = 


{{1 / 10, 3 / 10, 3 / 10, 3 / 103} , {3 / 10, 1 / 10, 3 / 10, 3 / 103 , 
{3 / 10, 3 / 10, 1 / 10, 3 / 103} , {3 / 10, 3 / 10, 3 / 10, 1 / 10}} 


www.it-ebooks.info 


Appendix A Use of Mathematica in Probability and Statistics 427 


iL 9 vB as Wa ee ea 
10’10’10’10 J’ 410°10’10’103’ 10°10'10’10J)’ 1 10’10’10'10 


In[205]= MatrixForm[m] 


Gur(20sj= (2. 2 2 2 
10 10 10 10 
3. 1 3 3 
10 10 10 10 
] 3 1 3 
io 10 10 10 
3 3 3 1 
io 10 10 10 


Powers of m can be computed, indicating the limiting form of this matric 


In[206]= MatrixForm[m.m] 


an 
a 


Out [206]= 


6 
25 
6 
25 «2525 
6 
25 
7 
25 


Bla Bla lo Sls 
a 
~ 


For higher powers of m it is best to use the MatrixPower command. Here the 20 th power 
of m is written in the usual form for a matrix. 


In[207]= MatrixForm[MatrixPower[m, 20]] 


Out[207]= (23 841 857 910 157 23 841 857 910 156 23 841 857 910 156 23 841 857 910 156 

95 367 431 640 625 95 367 431 640 625 95 367 431 640 625 95 367 431 640 625 

23 841 857 910 156 23 841 857 910 157 23 841 857 910 156 23 841 857 910 156 
95 367 431 640 625 95 367 431 640 625 95 367 431 640 625 95 367 431 640 625 
3 841 857 910 157 23 841 857 910 156 23 841 857 910 157 23 841 857 910 156 
95 367 431 640 625 95 367 431 640 625 95 367 431 640 625 95 367 431 640 625 
23 841 857 910 157 23 841 857 910 156 23 841 857 910 156 23 841 857 910 157 
95 367 431 640 625 95 367 431 640 625 95 367 431 640 625 95 367 431 640 625 


Each of the entries is equal and very close to 1/4. The fixed point for m can also be found. 


In[208]= Solve [{a, b, c, d}.m == fa, b, c, d}, fa, b, c, d}] 
Out[208]= {{b > a,c > a,d— a}} 


This indicates that all the entries for the fixed vector are equal 


Example 6.5.6 


In this example we need various matrix products and an inverse. These are done in the 
following way. 


In[209]= q= {{0, 3/4, O}* {1/4, 0, 3/4}, {0, 1/4, O}} 
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ourt2oa}= { {0, 3, of, ‘e ba): {9 of} 


In[210]= iminusq = IdentityMatrix[3] - q 


our(2uo}= { {1 3, of, {4 1, -3}, {0, -t, i}} 


In[{211]= inv = Inverse[iminusq] 


13 6 9 2 8 6 1 2 13 
our(aauy={ { 2, 3 ah? 3 et, {a 3 ahh 


In[212]= inv.{1, 1, 1} 


Out [212]= { Z, 2, : \ 
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Appendix B 


Answers for Odd-Numbered 
Exercises 


CHAPTER 1 


Exercises 1.1 


1. A is included in samples ABC, ABD, ABE, ACD, ACE, and ADE and is not in samples 
BCD, BDE, BCE, and CDE. 


3. S={(x,y)| x F#y, xy € {1,2, ... ,9}}, Shas 9-8 = 72 points. 


5. S = {(X1,%,%3,%X4,X5)|x, E{G,N},i=1,2,...,5}, or S={G,N}x{G,N}x 
{G,N} x {G,N} x {G,N} where A x B denotes the cross product of the sets A and B. 
S has 32 points. 


7. There are 15 sample points: 
AAAA NHAAAA 
NANAAA 
NAAAA NAANAA 
ANAAA NAAANA 
AANAA ANNAAA 
AAANA ANANAA 
ANAANA 
AANNAA 
AANANA 
AAANNA 


9. S= {(x, y)| xy € {1,2, ... , 12}, x # y}. Scontains 12 - 11 = 132 points. 
11. There are 15 sample points: 


32.62 
43 63 
42 64 
52 65 
533. «72 
54.73 
74 
75 
716 


Probability: An Introduction with Statistical Applications, Second Edition. John J. Kinney. 
© 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc. 
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13. 


HHHHH 
HHHHT 
HHHTH 
HHTHT 
HHHTT 
HHTHH 


Exercises 1.3 


3. 


5s 


a) 5/16 
b) 13/16 
a) S={(xy)| xy © {1,2,3,4, 5}}. 
b) 10/25 
c) 6/25 


. P(ANB) < min{P(A), P(B)} so the given probabilities are impossible. 

. 0.7 

. Consider P(B) = P[BN (A UA)] = P[(BN A) U (BN A)]. Then use the addition law. 
. P (exactly one of A, B)= P(A) + P(B) — 2P(A Nn B). 


Exercises 1.4 


3. 


23. 


a) 58/135 
b) 15/29 


. a) 4.9% 


b) 35/98 


. Use the addition law. 
a) 1/7 


b) 1/10 


. 0.994 
a4 


log(1—r) 
b) n> iam) 


. 1/4. [Consider P(1|1 or even).] 
. a) No 


b) No 
c) 1/6 
d) 1/3 


eae 
21. 


a) 17/70 
b) 12/17 


1/2 
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25. 


27. 


29. 
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a) P| =P.P2 =G+1(P— 4), 
p3 = 2r(r — 1)\(p— gq) +p where g=1-—p 
b) No 
ce) r=1/2 
a) 1/7 
b) 2/5 


The estimate of p is 2x the sample proportion answering “yes” — 1/2. 


Exercises 1.5 


7. 


9. 


b) 198/15625 = 0.002146 


Exercises 1.6 


1. 


a) 1-—(1— py) — pg) — pc) 
b) PaPe t+ PaPc + PaPc — 2PAPBPC 


3. PaPc(2 — Pa(2 — Pc) 


Exercises 1.7 


1. 


a) 4/9 
a) 1/9 
b) 2/9 


. There are more than 26° = 17,576 people in Colorado Springs 
- a) 280 


b) 431 


. 34,650 

. 7/24 

. a) At least one six in 4 rolls of a fair die. 
- a) 0.4035 


1316998181 


b) P (at least 1 duplicate) = 


. 1/70 
. 2162/54145 = 0.0399298 


- 201,600 
. 13,860 
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“Tr 2 3 4 5 6 7 8 9 10 11 12 13 
1 17 41 89 1343 3071 39547 122491 495739 2984059 35829883 
Py 12 72 96 144 1728 3456 41472 124416 497664 2985984 35831808 
a) 3/8 
b) 4/9 
a) 96/15625 = 0.006144 
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23. (N —2)/N 
25. 35 


27. a) 11/16 
b) 7/8 


Supplementary Exercises for Chapter 1 


1. 77/969 = 0.0794634 
3. 1/2 
5. 3/4 
7. 4 
9. 5/33 
11. a) 1/2 
b) 5/9 
c) 1/30 
13. a) 2/5 
b) 5/6 
c) 2/3 
15. 20% A's, 50% B’s, and 30% C's. 
217006443 _ 
17. 318555566 = 0.68122 
19. a) 7/0 
b) 510 
c) 207,446,400 
91: 1 =+/2/2 
25. a) S= {(x, y)| x,y € {1,2, ... 6}, x < y}. 
b) 3/5 
c) 13/15 


27. 


(8) 
_ 1621364909 _ 
29. om = 31750677980 = 0.0510655 


250 
100 
50 
33. 4/5 
35. a) 0.30 
b) 0.10 
39. a) 1/208,012 
b) 12/13 


31. = 1.11595 x 107'4 
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43. a) 0.936 
b) 6 


(m2 —m+1) 
45. m2(m—1) 


47. 0.4 
N! 
49. 1- (N—n)!N" 


51. a) 7,694,644,696,200 
b) 1,889,912,632,400 
c) 70,671,744 
d) 211,360,681,548 


53. 2/n 
55. 21,349/22,407 = 0.952783 


CHAPTER 2 
Exercises 2.2 


1. a) 2/5 
b) 2/5 


3. a) S: = {(X1,X9,X3,X4) |x; Ee {H, T}, v= 1,2,3,4} 
b) 3/16 
c) 7/15 


5. a) k= 6/31 
b) 25/31 
0 x<l 
6/31 l<x<2 
c) F(x) = 418/31 2<x<3 
27/31 3<x<4 
1 x>4 
9. Sum 2 3 4 5 6 7 8 9 10 11 12 
441 x Prob 36 60 3 76 70 56 35 20 10 4 1 


0 y<2 
1/4 2<y<3 
11. Gy) =41/2 3<y<4 
3/4 4<x<5 

1 y<5 


13. f(-1) = 1/3,f(0) = 1/2,f) = 1/6 
15. F(x) =i/ni<x<it1,i=1,2,...,n 
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Exercises 2.3 


1. w= 13/3,02 = 20/9 
3. a) E(X) = 1/2, Var(X) = 0.45 
b) $3,900 


5. E(X) = $12,900, Var(X) = $31,900,000, o = $1786.06 
7. E(X) = 3/5, Var(X) = 72/175 


2 
9, E(X) = me Var(X) = 
11. 9/25 
13. 1/6 if x=123 


a) P(X=x=21/8 if x=45,6 
1/16 if x=7,8 
b) $1.00 
15. E(X) 


Exercises 2.5 


1. a) 0.1091 
b) 0.999437 


3. a) 0.10292 
b) 20 
c) 0.3445 


5. a) 0.854134 
b) 0.0591414 


7. a) 0.004629630 
b) 0.0006766 
c) 0.0791125 


9. a) X is binomial with n = 7 and p = 0.2 
b) 73/15625 
c) 73/2313 


11. —$60,100 


13. y 3 2 at. @ 4 2 
59049xProb 29,161 10 40 80 80 29,678 


15. a) 0.966634 
b) 93 


17. 5/36 
19. 0.0973832 


21. a) 0.128506 
b) 0.0318415 


23. 65/256 


www.it-ebooks.info 


Appendix B_ Answers for Odd-Numbered Exercises 


Exercises 2.6 


Sx WD = 


. End points are 4/13 and 1/2 
. 10 to 20 

. 0.23598, 0.28551 

. 0.330775, 0.473463 

. A: 0.102589, 0.151481 


J: 0.0275727, 0.0495844 
No 


Exercises 2.7 


1. 
3. 


11. 


a = 0.0378016, 6 = 0.0912053 


a) 0.0866925 
b) 0.60801 


. c= 4and p = 0.99992 


. 0.0731714, 0.162123 
- 0.413143, 0.527647 


Yes 
400 


» 0.527975, 0.667947 


0.419418, 0.580582 


Exercises 2.9 


1. 
3. 
5. 


0.0669796 
3/4 

a) 0.0064 
b) 0.04096 


. 8/65 

. 0.123959 
. 10732 

. 0.0313811 
a) 1/7 


b) 7 
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17. a) $21 
b) No 


19. r/t 


Exercises 2.10 


1. 1/4 
3. 677,007 /832,370 
5s.a) x 3 4 ~=«5 6 7 


56 x Prob 1 3 6 10 15 


b) pw = 27/4, 02 = 27/16 
7. a) 0.880527 
b) 79/3200 


9. a) 0.956094 
b) 0.05629 


11. 0.0573656 


Exercises 2.11 


1. a) 0.416248 
b) 0.00202524 


3. 49/50 
5. 0.68122 


7. a) 27,683/32,340 
b) 97/2000 


9. 0.104399 
11. 10 


13. a) 0.999616 
b) 0.00740206 


15. a) 314.837 


Exercises 2.14 


1. x i 2 3 
Binomial 0.32768 0.4096 0.2048 
Poisson 0.367879 0.367879 = 0.18394 


3. 0.104137 


5. a) 0.0916 
b) 0.18045 
c) 0.9380 


7. a) 0.0469122 
a) 0.0661276 


4 5 
0.0512 0.0064 
0.0613132 0.0153283 
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6 
0.00032 
0.00306566 


17. 


19. 
21. 


23. 
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. The Poisson approximation gives 0.067086. 
. a) 0.227975 


b) 30 seconds 


. 0.014388 
. a) 0.265026 


b) 9.1774 x 10> 


a) 0.04979 
b) 0.03374 
c) 1.535 ft? 


23 


a) 0.180 
b) 0.143 


XxX 0 1 2 3 
c) 
Prob e72 2e72 2e-2 1-—S5e? 


d) 0.218 
e) 4 


¥ 0 1 2 3 4 5 6 or more 


32 32 128 643 
=e4 et 1-—e 
3 3 15 15 


Supplementary Problems for Chapter 2 


1. 
3. 


0.938031 


a) 0.0111603 
b) 0.0127952 
c) 29 


4/5 


1/4 


- a) 2, 162/54, 145 


b) 109 


. 5/273 
. 0.231639 


a, a ') a =p" 
y=1 


b) 1/3 


. 8/65 
. a) 0.013695 


b) 0.33282 


- 0.865618 
. 20 


: Pa + 3p, — pa) — pr) + 3pa4(1 — pa)? — pp) 
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27. 
29. 
31. 
33. 
35. 
37. 


39. 


41. 
43. 


0.74286 
0.707143 
0.206051 
0.290228, 0.379419 
Yes 
1692 
Ww 

a) 

216 x Prob 
b) -1/2 


-1 


0 


125 75 


1 2 
1d: I 


c) Make mas large as possible 


0.112553 
ii 


CHAPTER 3 


Exercises 3.1 


1. 


11. 


13. 


15. 
19. 


- aye 
b) w= 1/3,07 = 1/9 


- a) 3/8 


b) 19/64 
c) 19/27 
d) 0.693361 


-b) 1/4 


c) arc cos(1/3) 
-3/4 


b) 1/4 


- b) 3/4 


c) H=0,07 = 2/3 
a) 3/2 


b) FQ) = 


c) PY >y= 


a) ¢ = 0,07 = 27/5 


w= 16/35, 02 = 201/4,900 


w= 3/2,07 = 1/4 
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Exercises 3.2 


Se NUN w 


. 1/2 


1/3 


. 0.597126 
. The exact probability is 3/5. 


V3-1 
3 


. In 1.765 — 1 


Exercises 3.4 


1. 
3. 


19. 


- aye 


E[X] = 9/4, Var[X] = 1/16 
a) e72/3 

b) e7!/3 

-17/15 

b) 0.293681 


- A should be taken. 

A= 1/2 

e625 

. 122,000 miles 

~ b) FQ) = 1— e302), x > 2. 


—6 
d) 1.69495 x 1074 


a) A 
b) ew2/A 


Exercises 3.5 


1. 


a) 0.866386 
b) 0.421084 


- a) 0.0026 


b) 0.682689 


. Ul 
- 68.75% 


Yes 


. $37.98 
. 17.01% will be outside warning limits 
- a) 0.213485 


Answers for Odd-Numbered Exercises 


b) 12 bags gives a probability of about 0.95. 


- —0.427183855 
- 0.975412 
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23. 910.059 


25. a) 0.308538 
b) 0.2113 
c) 0.0152 


Exercises 3.6 
Answers given are the exact result followed by the normal approximation. 


1. 0.529141, 0.529514 


3. 0.0795892 and 0.0666053 are the exact probabilities. The normal approximations are 
0.07965579 and 0.0668073. 


5. a) 0.219353, 0.214145 
b) 0.934849 


7. 26, 26 

9. 26, 26 

11. 0.229707, 0.230906 
13. 0.9851, 0.9854 

15. 0.657949, 0.642834 


Exercises 3.7 


2 
_— -2y 5 93 

1. a) fO) =e «2 eT 
. 
— 27 4 96 ye 

b) go) =e? «2 arr} 


3. a) w= 1/2,02 = 1/20 
b) 0.66229 


11. Yes 
13. a) e12 


c) 0.434598 


Exercises 3.8 


1. b) 0.301305 
c) 0.67032 
3. a) et /20,000 
b) 125.331h 
c) 0.324652 
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5. 


7. 
9. 


b) 0.283468 
1 


p(n 2)a 
0.443559 
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Supplementary Exercises for Chapter 3 


NM WwW = 


11. 


13. 
17. 
19. 


21. 


23. 


25. 
27. 
29. 


. 0.0730169 
. 0.1686 

» 2.27789 x 10+ 
. a) 0.367879 


b) 0.306432 


- b) 0.7875 


c) 62/15 

d) 729/8000 

b) 112/243 
0 

c) F(x) = 45x47 -— 40° 
1 

V1I/5 

poi,o =1/5 

a) k=2 

b) FQ) = 


—x2 


0.0220301 


w=1,07 = 1/2 


if x <0 
if0<x<1/2 
if 1/2<x<1 
ifx>1 


P(x = —2) = 1/6, P(x = -1) = 1/3, P(x = 2) = 1/2 


2/3 


CHAPTER 4 


Exercises 4.3 


1. eg) =1/9,4<y< 13 


3. g(y) = 1/(2,/¥),0<y<1 
5. a) E(Y) = e!t!/20*, Var(¥) = e249" (e% — 1) 
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In y—v : 
b) gy) =— snl o >0 
8) = oy 2a > y 
a yh 


1 
9. gy) = —a ee 
3y3 


5° sin(—1) < y < sin(1) 


1 
11. gy) = SS 
2v l-y 
13 ae ae 
a) 80) = qo 29 


15. 602/3 
17. g(y) =nle~“™’, y > 0 
1 1 


21. No 

23. a) Y=a+(b—a)X 
b) Y= —=InXx 

27. 35 


29. 9(y) = 2/3, 1/2<y<2 


Exercises 4.4 


5, 
Sum 3 4 5 6 7 8 9 10 1 12 


64 x Prob 1 3. 6 10 12 12 10 6 3 1 


7. a) PIK+¥ =2)=( 
b) 7/2 


NI- 
Set 
b 
| 
a 
a 
We 
eed 
N 
“NX 
Il 
N 
be 
- 


Exercises 4.5 


Sum 3.4 5 6 7 8 9 10 It 12 13 14 15 16 17 18 
64xProb 1 3 6 10 15 21 25 27 27 25 21 #15 10 3 6 1 


5. All the subsets of {a, b, c} 

7. 1- (1-4) log 9 

9. a) tPy(t) 
b) Py(t*) 

11. 
Sum 2 3 4 5 6 i) 8 9 10 II 12 
126xProb 1 3 6 10 15 21 20 18 «15~«211 6 
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Exercises 4.7 


3. a) eA) 

5. a) 16/31 
b) = (16 + 8t4+ 42 + 28 + 14) 
c) w= 26/31, 02 = 1,122/961 


71393157 


a) 1073741824 ~ 0.0664901 
2046448125 _ 
b) 2147483648 = 0.952952 
9, a) ef! 
b) wee? =1 


Exercises 4.8 


5. a) e7All-e') 
7. a) w= —5/6, 0? = 65/36 


b) 1- 274 27 - a ae 


36 48 
9. a) (4 ) 


bg] 7) ceri? 


1) it 4-2ty 4 27,4t _ of 
11. z(¢ et) + rd e') 
13. a) 0.774538 
b) N(30, 6) 
15. a) ~(e — ¢") 
b) #=5/2,07 =1/12 
17. a) w =5/3,07 = 10/9 
b) X is binomial with n = 5 and p = 1/3 


Exercises 4.10 


1; 4) 
b) uw =3/2,07 = 1/4 

3. a) = 1/5,07 = 1/25 
b) fase" 2 >0 

5. w= 10,07 = 20 

7. 0.97725 

9, w=0?=1 


11. M is the generating function for the variable X — A, where X is Poisson with 
parameter A. 
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17. d) X, +X, has probability distribution given by g(y) 
19. s(e! dog? = 3) 
21. 0.814453 


Exercises 4.11 


3. 0.974085 
5. 166 


7. a) 0.3993 
b) 1008 


9. 0.008853 
11. 0.0681571 
13. 0.0227501 


15. a) 0.382925 
b) 0.682689 


17. The probability that X is in the range 3.11 to 3.61 is 0.95833 


Exercises 4.13 


1. 77.7867, 1789.36 
—ps2. (n—Ds2 
3. ae ver where i and x are upper and lower y? values. 
U L 
5. 12 
7. 0.898904 
9 


» 2.55646, 13.1746 


Exercises 4.14 


1. a) Yes 
b) 0.03338 
3. a) X < 0.248897 
b) 0.41365 
c) n=49 
5. a) 0.0228 
b) 0.158655 


7. a) 5.175, 10.825 
b) 3.8756, 45.282 


9. a) Reject if x > 4.02816 or if X < 3.98179 
b) Reject H, 
c) 0.877193 


11. Yes 
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13. 


15. 


17. 


19. 


Appendix B 


a) 0.055115 
b) 0.21186 


a) No 
b) 0.837945 


a) X < 73.4538 
b) Yes 

c) 0.939007 

d) 2.34 x 107!2 


a) 40,800 
b) 1,089,055 to 3,749,072 
c) 40,778 


Exercises 4.15 


1. 


Sen un Ww 


11. 


13. 


15. 


17. 
19. 


a) Accept H, 
b) Yes 


. Accept H, 

- 0.153189, 11.9492 
. —0.223, 8.223 

- a) Accept H, 


b) 0.141803, 4.16881 
c) Reject H, 


a) —960.19, —309.81 
b) —949.55, —320.55 
c) 0.00242706 


a) Accept H, 
b) Reject 


Answers for Odd-Numbered Exercises 


a) Reject the hypothesis that painting reduces the top speed. 


No, if a = 5% 
n= 110 


Exercises 4.16 


1. 


b) y = —0.273177 + 1.61719x 


445 


c) F(1,4) = 99.041. PL[FC,4) > 99.041] = 5.726 x 10~+, so the fit is a very good one. 


- b) y= 66.198 — 0,862x. 


c) F(1,8) = 16.9988, P[F(1,8) > 16.9988] = 3.33 x 107%, so the fit is avery good one. 


- b) y= —0.192719 + 1.22056x 


c) x = 7.31915 + 0.485 106y 


- a) y = 0.489784 + 0.27243 1x 


b) r= 0.92 
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an whai4 
b) £ = 0.953807 


Exercises 4.17 


1. b) 8.8292, 11.2458 


Supplementary Exercises for Chapter 4 


— 2 
lea) Se" - 1) - = 
b) w =a/3,07 =a’/18 
gl+t_y 


3. a) 7. 
b) v = 0.3862, o” = 0.03909 
k k 

5. pT (1 es =) 

7. 4/81 

9. Poisson[Ap] 

11. 16 

13. a) 0.23975 
b) 0.0569231 

15. a) 0.135666 


b) 0.0139034 
c) 0.8133 
d) 0.0126737 


17. a) S(e-t- 1) 
2 
b) (k+1)(k+2) 
/2 
o) Se? -1/2-1) 
19. a) wp =5/3,07 = 5/9 
b et et 
gag 
21. a) P(X = 1) = 2/5, P(X = 2) = 1/5, P(X = 3) = 2/5 
b) w=2,07 = 4/5 
te 
23. aT >0 
at 
25. a) g(y)=y2'>-l<y<l 
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27. 
29. 
31. 


33. 


35. 
37. 
39. 
41. 
43. 


45. 


47. 
49. 


51. 
53. 
55. 


57. 


59. 


61. 


Appendix B- Answers for Odd-Numbered Exercises 


y 
ees 1 
gly) rer a 
ey) =2/7,y> LEW) =2 


3-Vyt+l 
gy) = —,-1<ys8 


2/3 if0<y<I1 

Zo)= ; 
1/3 ifl<y<2 

0.017008 
Yes. FUE, > 56.4) = 0.002456 
No 
5.88586, 8.17407 
a) Accept H, 
b) 9.44793, 11.1521 


a) 70 

b) 0.617075 
0.965068 

a) Reject H, 
b) 0.00925637 
658.88, 701.13 
Yes 

a) Accept H, 
b) 0.2610863 


a) Reject H, 
b) Reject H, 


a) x < 181.997 
b) 0.857088 


1.48773, 3.97234 


CHAPTER 5 


Exercises 5.2 


3. 


a) x 2 25 3 35 A 
f(x) 0.17 0.14 0.20 0.33 0.16 
ey) 0.05 0.15 0.28 0.40 0.12 

b) E(X) = 3.085, E(Y) = 3.195 

c) 0.68 


www.it-ebooks.info 


447 


448 Appendix B Answers for Odd-Numbered Exercises 


5, a) fey) = (‘) (;) (, ee ; / eS Pe eee eeres: 


188 up 
1 if x=0 
b) f(x) = =. if x=1; g()is similar, 
1 ‘ 
1 if x= 2 
7. a) 1/65 
a xtl 
b) fa) = x= 1, 2, ... 10 
- if y=9, 10 
BY) = y+1 
rs if y=0,1, ... 8 
9. a) P(X=x,Y=y= @ ("”) prrygntmY x=0,1,...,1,y =0,1, ... , 


b) X + Y is binomial with parameters n, + ny and p. 


_ e6t (xX) (3\9(1\% ee 
Il. a) f(y) = (*) (2) (4) Pe OT x. TPaOily ai 
=9/2(9/2) 
b) 8) = y= O1, i 
c) E(Y) = = +6 


13. P(X =x,Y=y)= (?) (5) ces) / (3) x =0,1,2,3; y =0,1,2;x+ y <3 


15. b) f(x) = 123(1 — x7),0<x<1 
a(y) = 12y°(1-y),0<y<1 
17. a) 3 
b) f@) = 6x1 -— x), 0<x<1 


5(1-y) O0<y<l 


2=4 4 
SL -l<y<0O 
19. a) W/y 0 1 2 fiw) 
0 1/8 1/8 0 
1 1/8 2/8 1/8 
2 0 1/8 1/8 
gy) 2/8 4/8 2/8 


b) f0) = ee . ee > ais similar 
21. b) 1/16 
c) 7/16 
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23. b) f(x) = 5(1— 7), O<x<1 
ey) = 3y7, O< y<1 


Exercises 5.3 


3. a) f(xly) = ae O<x<1 


2@+y) 
fol) = 22 o<ysi 


3y+2 

b) E(XX|Y = y) = = 
3x42 

Bea = 


29 
5. f(xly) = aes 1 


2 
fol = 3, 0<y<x 


7. a) k=8 
b) E(Y) = 8/15 


= 3/2 
9. a) E(Y|X) = =x Ly O<x< Baie? 2 ree ,0<y<l 
b) 3/8 


Vl. a) f(x) = 14 2x - 3x2, O<x< 1 
£0) = 36 - 8y + 3y*), O<y<1 
b) folx) = — 7 O<y<1-x 


Exercises 5.5 


13. py f(aly) = 2x/y, 0<x< yy 


c) 2 
15. f(aly) = ay V<*< Wy 
fol = 4. e<y<l 
17. a) f(x) = =, x = 1,23.9(y) is similar 
b) gas. y= [2 .2,4o% 
19. 1/3 
21. by f(x) = 3 /4,0<x<2 
c) No 
d) 7/12 
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23. 220 — 30/2 


27. 14/9/247 


Exercises 5.6 


1. c) 0.355435 


Exercises 5.7 


1. oD = z>0 


=e 
(z+2)2’ 
= (3-2), z>0 
3. gz) = 273 5 
3% —37°+4) 1<z<2 


Supplementary Exercises for Chapter 5 


1. a) 3/128 
b) No : 
c) fd = _— LD eee 
2 
fa) = BY 2<y<2 
3 Dip: 3: 
f(xly) = ae 32 R<9 
3(x2-+4y2) 


(0) =e! oo 
3. _ f 1/36 xy, e= 1,2, 22.56; 
m Fly) = {36 x>y,x = 1,2, ... 6; 
b) E(X) = 161/36 
c) E(Y) = 91/36 
d) E(X + Y) = E(X) + E(Y) 


5. 1/3 
7. a) 3/16 
b) /y/2,0<x< yy 
g, tolls? = 0.853582 
1 
ile 


13. a) F(%y) = 12x,0<y < 2x,0<x< 1/2 
b) gy) = 5 -y"),0<y<l 
c) No 
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15. a) 2 
b) No 
c) 76/81 


17. a) 19 
By =o 
c) -1/5/3 
d) 1 
e) 250 
f) 44 


19. a) 12 
b) f@®) =6°,0<x<1 
g(y) = by — y),0<y<1 


c) No 
d) 27/32 
21. a) yy 0 1 2 3 4 

0 16 
I 4/16 3/16 
2 3/16 2/16 
4 2/16 
4 1/16 


f) 1/16 4/16 6/16 4/16 1/16 


b) 3/16 
c) 2 
23. a) 1/2 
b) f@ =1,0<x<1 
sg ss 
gl=42/2 
CHAPTER 6 


Exercises 6.2 


3. X is a geometric random variable. a, = pq’!,n> 1 


5. a) a, =p at qay-| + Pgay_|.N 2 2,a9 =a > 0, ay =p 


7 (1=p)(p1—P2)"!+p2 | 
, 1—p,+p2 ~~ 


3/5 \" 4 _ 
9. a) 2(3) + 5.1 = 0,1,2, bie 


b) 0.754238 


1+p+p? 
yi, bese" 
P 
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gy) 
1/16 
7/16 
5/16 
2/16 
1/16 
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Exercises 6.3 


n 
1. Pu= 


Exercises 6.4 


1. a) wy = lu, =u, =0, 


= 14 ——o? __. _29801576_ _ 
b) Us) = 1+ Gyan’ Teneise7 — 0-025041 
— sy 58333184 
ee (ps)}3-+(1—s)(1+ps+(ps)2)’ 3486784401 GIST 78 
3, L+2pq-Spq?+p7q?=p*¢° 
Exercises 6.5 
$,°(1733/5) 
5. (444/699,165/699,90/699) 
1/3 0 1/3 0 1/3 0 
) 1/3 0 1/3 0 1/3 
9 1/3 0 1/3 0) 1/3 0 
° ) 1/3 0 1/3 0 1/3 
1/3 0 1/3 0) 1/3 0 
0 1/3 0 1/3 0 1/3 


Supplementary Exercises for Chapter 6 


n 
1. k-1 = 
a) matty ( 2 ) po Upp NS 3, Uy = Lu = Gi = 7g =P +g 
k=3 


b) Uy, = PWp-1 + QUy—13 Vn = Pun} + GVn—13Wy = PVn-1 + QWn-1> 
Vy = 3QVn-1 ~ 3¢7Vn_2 te (p> + GP Vn-3 
n> 4,09 = 0, Vj) =p, Vo = 20g, vy = 3pq" 


1 
sU + @ +p)" 
2n\ 1 nn 
3: (7") m1? 9 


9. p, = 5+ Qp-V"Ln21 
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Appendix C 


Standard Normal Distribution 


The entries in this table give the areas under the standard normal curve from 0 to z. 


Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 


0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359 
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753 
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141 
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517 
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879 
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224 
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549 
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852 
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133 
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389 
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621 
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830 
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015 
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177 
14 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319 
15 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441 


(continued) 


Probability: An Introduction with Statistical Applications, Second Edition. John J. Kinney. 
© 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc. 
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(Continued) 


Zz 


0.00 


0.01 


0.02 


0.03 


0.04 


0.05 


0.06 


0.07 


0.08 


0.09 


1.6 
1.7 
1.8 
1.9 
2.0 
Dull 
2.2 
2.3 
2.4 
2.5 
2.6 
2.7 
2.8 
2.9 
3.0 


0.4452 
0.4554 
0.4641 
0.4713 
0.4772 
0.4821 
0.4861 
0.4893 
0.4918 
0.4938 
0.4953 
0.4965 
0.4974 
0.4981 
0.4987 


0.4463 
0.4564 
0.4649 
0.4719 
0.4778 
0.4826 
0.4864 
0.4896 
0.4920 
0.4940 
0.4955 
0.4966 
0.4975 
0.4982 
0.4987 


0.4474 
0.4573 
0.4656 
0.4726 
0.4783 
0.4830 
0.4868 
0.4898 
0.4922 
0.4941 
0.4956 
0.4967 
0.4976 
0.4982 
0.4987 


0.4484 
0.4582 
0.4664 
0.4732 
0.4788 
0.4834 
0.4871 
0.4901 
0.4925 
0.4943 
0.4957 
0.4968 
0.4977 
0.4983 
0.4988 


0.4495 
0.4591 
0.4671 
0.4738 
0.4793 
0.4838 
0.4875 
0.4904 
0.4927 
0.4945 
0.4959 
0.4969 
0.4977 
0.4984 
0.4988 


0.4505 
0.4599 
0.4678 
0.4744 
0.4798 
0.4842 
0.4878 
0.4906 
0.4929 
0.4946 
0.4960 
0.4970 
0.4978 
0.4984 
0.4989 


0.4515 
0.4608 
0.4686 
0.4750 
0.4803 
0.4846 
0.4881 
0.4909 
0.4931 
0.4948 
0.4961 
0.4971 
0.4979 
0.4985 
0.4989 


0.4525 
0.4616 
0.4693 
0.4756 
0.4808 
0.4850 
0.4884 
0.4911 
0.4932 
0.4949 
0.4962 
0.4972 
0.4979 
0.4985 
0.4989 


0.4535 
0.4625 
0.4699 
0.4761 
0.4812 
0.4854 
0.4887 
0.4913 
0.4934 
0.4951 
0.4963 
0.4973 
0.4980 
0.4986 
0.4990 


0.4545 
0.4633 
0.4706 
0.4767 
0.4817 
0.4857 
0.4890 
0.4916 
0.4936 
0.4952 
0.4964 
0.4974 
0.4981 
0.4986 
0.4990 
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THE t DISTRIBUTION TABLE 


The entries in the table give the critical values of t for the specified number of degrees of freedom 


— 


and areas in the right tail. 


0 ¢ 
df Area in the Right Tail under the r Distribution Curve 
0.10 0.05 0.025 0.01 0.005 0.001 

1 3.078 6.314 12.706 31.821 63.657 318.309 
2 1.886 2.920 4.303 6.965 9.925 22.327 
3 1.638 2.393 3.182 4.541 5.841 10.215 
4 1.533 2.132 2.776 3.747 4.604 7.173 
5 1.476 2.015 2.571 3.365 4.032 5.893 
6 1.440 1.943 2.447 3.143 3.707 5.208 
7 1.415 1.895 2.365 2.998 3.499 4.785 
8 1.397 1.860 2.306 2.896 3.355 4.501 
9 1.383 1.833 2.262 2.821 3.250 4.297 
10 1.372 1.812 2.228 2.764 3.169 4.144 
11 1.363 1.796 2.201 2.718 3.106 4.025 
12 1.356 1.782 2.179 2.681 3.055 3.930 
13 1.350 1.771 2.160 2.650 3.012 3.852 
14 1.345 1.761 2.145 2.624 2.977 3.787 
15 1.341 1.753 2.131 2.602 2.947 3.733 
16 1.337 1.746 2.120 2.583 2.921 3.686 
17 1.333 1.740 2.110 2.567 2.898 3.646 
18 1.330 1.734 2.101 2.552 2.878 3.610 
19 1.328 1.729 2.093 2.539 2.861 32579 
20 1:325 1.725 2.086 2.528 2.845 3.552 
21 1.323 1.721 2.080 2.518 2.831 3.527 
22 1.321 1.717 2.074 2.508 2.819 3.505 
23 1.319 1.714 2.069 2.500 2.807 3.485 
24 1.318 1.711 2.064 2.492 2.797 3.467 
25, 1.316 1.708 2.060 2.485 2.787 3.450 
26 1.315 1.706 2.056 2.479 2.779 3.435 
27 1.314 1.703 2.052 2.473 2.771 3.421 
28 1.313 1.701 2.048 2.467 2.763 3.408 
29 1.311 1.699 2.045 2.462 2.756 3.396 
30 1.310 1.697 2.042 2.457 2.750 3.385 
31 1.309 1.696 2.040 2.453 2.744 3.375 
32 1.309 1.694 2.037 2.449 2.738 3.365 
33 1.308 1.692 2.035 2.445 2.733 3.356 
34 1.307 1.691 2.032 2.441 2.728 3.348 
35 1.306 1.690 2.030 2.438 2.724 3.340 
36 1.306 1.688 2.028 2.434 2.719 3.333 
37 1.305 1.687 2.026 2.431 2.715 3.326 
38 1.304 1.686 2.024 2.429 2.712 3.319 
39 1.304 1.685 2.023 2.426 2.708 3.313 
40 1.303 1.684 2.021 2.423 2.704 3.307 
for) 1.282 1.645 1.960 2.326 2.576 3.090 
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CHI-SQUARED DISTRIBUTION TABLE 


The entries in this table give the critical values of X* for the specified number of degrees of freedom 


= 


and areas in the right tail. 


0 2 
df Area in the Right Tail under the Chi-square Distribution Curve 
0.995 0.990 0.975 0.950 0.900 0.100 0.050 0.025 0.010 0.005 

1 0.000 0.000 0.001 0.004 0.016 2.706 3.841 5.024 6.635 7.879 
2 0.010 0.020 0.051 0.103 0.211 4605 5.991 7.378 9.210 10.597 
3 0.072. O.115 0.216 0.352 0.584 6.251 7.815 9.348 11.345 12.838 
+ 0.207 0.297 0484 0.711 1.064 7.779 9.488 11.143 13.277 14.860 
5 0.412 0.554 0831 1.145 1.610 9.236 11.070 12.833 15.086 16.750 
6 0.676 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812 18.548 
7 0.989 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475 20.278 
8 1.344 1.646 2.180 2.733 3.490 13.362 15.507 17.535 20.090 21.955 
9 1.735 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666 23.589 
10 2.156 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209 25.188 
11 2.603 3.053 3.816 4575 5.578 17.275 19.675 21.920 24.725 26.757 
12 3.074 3.571 4404 5.226 6.304 18.549 21.026 23.337 26.217 28.300 
13 3.565 4.107 5.009 5.892 7.042 19.812 22.362 24.736 27.688 29.819 
14 4.075 4.660 5.629 6571 7.790 21.064 23.685 26.119 29.141 31.319 
15 4.601 5.229 6.262 7.261 8.547 22.307 24.996 27.488 30.578 32.801 
16 5.142 5.812 6.908 7.962 9.312 23.542 26.296 28.845 32.000 34.267 
17 5.697 6408 7.564 8.672 10.085 24.769 27.587 30.191 33.409 35.718 
18 6.265 7.015 8.231 9.390 10.865 25.989 28.869 31.526 34.805 37.156 
19 6.844 7.633 8.907 10.117 11.651 27.204 30.144 32.852 36.191 38.582 
20 7434 8.260 9.591 10.851 12.443 28.412 31.410 34.170 37.566 39.997 
21 8.034 8.897 10.283 11.591 13.240 29.615 32.671 35.479 38.932 41.401 
22 8.643 9.542 10.982 12.338 14.041 30.813 33.924 36.781 40.289 42.796 
23 9.260 10.196 11.689 13.091 14.848 32.007 35.172 38.076 41.638 44.181 
24 9.886 10.856 12.401 13.848 15.659 33.196 36.415 39.364 42.980 45.559 
25. 10.520 11.524 13.120 14.611 16.473 34.382 37.652 40.646 44.314 46.928 
26 =11.160 12.198 13.844 15.379 17.292 35.563 38.885 41.923 45.642 48.290 
27 ~=—-:11.808 12.879 14.573 16.151 18.114 36.741 40.113 43.195 46.963 49.645 
28 =: 12.461 13.565 15.308 16.928 18.939 37.916 41.337 44.461 48.278 50.993 
29, 13.121 14.256 16.047 17.708 19.768 39.087 42.557 45.722 49.588 52.336 
30 =—-13.787 14.953, 16.791 18.493 20.599 40.256 43.773 46.979 50.892 53.672 
40 20.707 22.164 24.433 26.509 29.051 51.805 55.758 59.342 63.691 66.766 
50. = 27.991 29.707 32.357 34.764 37.689 63.167 67.505 71.420 76.154 79.490 
60 35.534 37.485 40.482 43.188 46.459 74.397 79.082 83.298 88.379 91.952 
70 43.275 45.442 48.758 51.739 55.329 85.527 90.531 95.023 100.425 104.215 
80 51.172 53.540 57.153 60.391 64.278 96.578 101.879 106.629 112.329 116.321 
90 59.196 61.754 65.647 69.126 73.291 107.565 113.145 118.136 124.116 128.299 
100 67.328 70.065 74.222 77.929 82.358 118.498 124.342 129.561 135.807 140.169 


www.it-ebooks.info 


(panulyuo?) 


457 


86°C 
IT’ 
LOE 
LV'e 
IL’ 
10'r 
Iv'v 
96'r 
SL’S 
669 
cl'6 
Boel 
VO9T 
6V'66 
peo 


OOT 


80° 
CCE 
BEE 
LOE 
Ise 
(a 
CST 
LOS 
98° 
60°L 
v6 
69° 
GSE9C 
87°66 
c0C9 


os 


ele 
LOE 
eve 
COE 
98 
LIV 
LOV 
crs 
To's 
VIL 
6c°6 
SLel 
Iv'9c 
LV'66 
L8C9 


OV 


Ice 
See 
IS'€ 
OLE 
Po'' 
SCV 
so'L 
0's 
66'S 
ecL 
8e°6 
p8'el 
0S'9C 
LvV'66 
19c9 


0¢ 


8CE 
Ire 
LSE 
OLE 
10'r 
lev 
IL'v 
97S 
90°9 
OeL 
cv'6 
16'€1 
89°9C 
9V'66 
0765) 


SC 


LEE 
Ise 
99° 
98'E 
Orv 
Ivy 
I8'y 
9C'S 
919 
OL 
cS°6 
cori 
69°9C 
SV'66 
60¢9 


0c 


CSE 
99° 
CBE 
lO'V 
SCV 
oor 
96'7 
CSS 
Te9 
OSL 
cL’6 
Oc rl 
L8°9C 
cv'66 
LSI9 


ST 


LOE 
O8"€ 
96'¢ 
OT 
Ory 
IL'v 
Irs 
LOS 
Ly9 
CLL 
68°6 
Levi 
SOLE 
cV'66 
9019 


cl 


ele 
98" 
COV 
CCV 
oY 
LLY 
81's 
els 
ps9 
6L'L 
96'6 
cr'rl 
eLLe 
1v'66 
€809 


I] 


Ose 
v6’ 
Orr 
Oc Vr 
vSV 
S8'T 
9eS 
18°¢ 
c9°9 
L8‘L 
SO'Or 
cor 
CLE 
Ov'66 
9S09 


Ol 


68° 
cO'V 
6l'V 
6oV 
OV 
bor 
ges 
16'S 
cL’9 
86L 
OT Ol 
99'rI 
SOLE 
6£°66 
cTO9 


6 


Ke) 
tT 
i=) 
al 
io 
2 
i=) 
4 
fo 
a 
So 
al 
a 
of 
ol 
al 
‘© 
S 
N 
al 
te 
Se 
on 
al 
\o 
cN 
\o 
al 
aNM TNO” 


IOWIOUINN IY) JOF WiopselJ Jo soaisaq 


400 


‘10'0 = 9AIND VONNENsIG 7 9y Jopun [EL sty op Ul vary 


t-ebooks. info 


www.!| 


oor pLt O8T 68t LOT LOC COC LET VT OS'T 6SC 69C CBC 66 ITE ISE B86E CBr 069 ODT 
cst = Sé'l loc OC LIC LOC WO OST ENTS OLT 8Lc 668C—0COU COE COTE CWE CLE COO ODS LT L 0S 
vot 900 ITC OCT LOT LET CEC 99 ELT 08°C 68C 66¢C CIE oce ITSe ese Tep srs TEL Ov 
elo SCC OFT 6€C SHYT SSC OLT HBT 167d 867 Loe LYE Ove Lve OLE Wr ISb 68S OEL O€ 
6cc =OVC «SHC PST ONT: «=COLT SBC 66TODDE Ee ce | 6CEE OVE CEE 6 SBE CBI BOF LES LLL Sc 
coc Wwe ore 8ST PIT PLT G68T COE 6DE LT€ 9cE 9FE OSE LYE O6E rH Cle I9S BL ve 
LEC 8VC PSC COT ONC: «6BLE OUGHT COLO OPE Ive Ogee we pre ILE voe Mb BLy 99S BBL €C 
we ESt 8SC LIT ELT EBC B86C CIE SIE 9 see Sre ofSe 9LE 66€ TEV CBF CLS SOL CT 
8VT 8ST OPO CCL COCOLTCOBBCT OOE CLT OTE Tee Ore Se pe Ee pr Lev L¥v BLS 8 Ic 
voc p9T 69 BLE PBC VOT BOE ECE 6TE Lee Sve OSE OLE LYE Olv Che v6 SBS OI8 0c 
oye = ILc¢ OLE P8C «16S: «(OOK 6STE (OEE OORE eve ce fe Lle vor Liv Orv 10S £65 8I8 61 
89 8Lt 8C COT B86C B80 ETE LEE EVE IS oye ILE pe 0b Str Br 60S 109 678 8 
9Lt LBC COT ODE LOE Oe Tee MVE TE 6S°E sve ole coe Olv rer LIF BIS IT9 Ors LI 
98% Loc COE OLE OTE ICE IVE SSE CE 69'€ 8Le 68& Or OC Hy LLY 60S E€C9 E83 ol 
OOT Os Ov Of Sc 0c SI cl I Ol 6 8 L 9 ¢ v € (4 I 
JOWIOWINN OY) JOJ Wopsdsy Jo sooisoq 
(panuyuo)) 


458 


t-ebooks. info 


www.!| 


(panulyuo?) 


cle 4 80cC 6 6O0CC) | =0SCC O8CCCOEC OOOO BHTOCdLS'T VST 6c Ce ILt 6Le O6C 90E GOCE BE Pr SI 
6c vec Loc et vet OFC OVC ESC ULGT 09°C soc OLC OLE S8C 96 ITE ree PrlLe OFF FI 
9CT =OCIET OPEC OCC COC OCOOVT OES OOD EN'T L9T Zc LLC €8¢C coc cOE Ble Tre TE LIP ET 
sec 8O0OVC 6h COLVE OOS OST ODT DTC SLT Osc ssc 1l6c OOF ITE YCE OVE O8E SLY TI 
ove =—ISC ESC CLEC OCOON SOT CCL OOLECCBT S8°C 0oc Soc 10e 60 OTE FE OFE BHE Pr TI 
6c p9T 990 OLE ELeE LLET SBC 160 v6T 86°C coc Loe pre coe cee B8re Tle Ov 96F Ol 
9LC O8C €8C 98C 68C vEcT TOE LOE Ole PIE 8le ce oce LEE Bre CHE IVE Yov CIS 6 
Loc 6©6cOE CUVOE =C BOE CTE COSTE OE COBTECOCUE Cee 6ce pre OSE BFE OF PP8E LOD MFH CES 8 
LOE Cee pee BEE OVE wre Te Loe OME p9'e soe cle ole LE LOE Clb Sev bly 6's L 
Ive Cre LLe I8€ E8e LFE vee OOK OT 90'V Oly SIlv Iter 80r 68h EFbr OLb PIS 66S 9 
Ivr whe 9b OSb Cr Wr Br B87 OL vLY LLY CV 88b Sov sos ors IVs O6lS 199 ¢ 

v 

€ 

(4 

I 


996 OLS CLS SLE LLS O8S 98S T6S E's 96'S 009 P09 609 919 979 669 699 69 ITLL 
SOS 893 688 cw8 8 998 OL8 PL8 OLB 6L'8 138 S88 688 68 106 Cl6 806 SS6 EI 0I 

orol 8rol Lyol OV6I OV6l SYol erol IV6l Orel Orel 8col Leol Seol ceol Oc ol Scol OIT6l OOS! IS 8I 
Oesc SISce TISt TOSt Core O8PT OOP OPT O'EVT 6 TVe SOrC O8Et BIT OPEC TOET NPT LIT S66 SII 


OOT Os OV 0¢ SC 0c a cl IT Ol 6 8 L 9 ¢ v € c I 


IOJIOUINNY OY) 1OF Wopsd1y Jo soaisaq 


"S0'0 = PAIND UONNgENsIG J 9y Jopun [EL sty ep Ul vary 


459 


t-ebooks. info 


www.!| 


6el s8rt cSt LEt cr sot LLT S&T 68T col Zot Oc OC 6c Tec OFC OLC OOF P6E ODT 
cST «8O9T «69T 6 66OT) hUELT 6 6U8LT 6UL8T) OS6T COG £0°C LOC €1C OCC OCC OVE O8C OLT BIE COV 0S 
6ST 99T 6O9T PLT BLT r8T cor O00c v0 80°C cle 46810) 0 6StC OUET OSVE CIDE OBC OCETE OBO Ov 
OLT 9OLT OLT 8ST 881 6l 10c 60 El? 91'S Ict LOC €€C We EST O9T COT CEE LIV 0€ 
SLT p8t L8t cot 96T 10 600 9TC OCT VOT 8CC  =6be SC COKE OONT OLE HHT OOEE OL Sc 
Ost 98T O81 vert LOT OC OTC BIC CCT TT Ofc 98C CHC ISt CC B8LC IDE OVE IF VC 
C8 = 88T lol 96T O0C SOT ELT OCT VET LOT coc | =6LEC OUPC COUEST OUPOT COOBT COOE 6H BTV €C 
sst 61 pet LOT Wet LOC STITT ECC ICT OT vet OVC OVE SEC I99T CE SOE PHE OEY CT 
881 ret 96T Toc }=9SO0C) =6—OTT BT OSTETOBBTT CET LEC CWC OFC LET 89 «PBC ULOE ULE UCEY Ic 
ToT LOT 661 OT LOT CIC OCC BCC Te? Sec ogc Sro IISc O9C ILE LBC OLE OVE SEV 0c 
vot O00¢ 0G LOC ILC 91C ECC TEC HET 8OC we 8rc SC ENT 6CPLE OUOHTCE EE CGE BEY 61 
86T voc 900 ITC Itc 61t LOC PEt LET Ive OVC =ISC BSC ONT: CLL OHCs SECT 81 
jWc 80C OC STC 81t Ecc Tec BEC IVT SG ove SSC 19 OLT I8T 96 OTE OSE Sr LI 
LOC CIC STC OFC €C BCT SET CHC OT 6v'C vot 68C 99T PLE S8T TOE PTE EVE G6VY ol 
OOT Os Ov O€ Sc 0c SI cl I Ol 6 8 L 9 ¢ v € c I 
JOWIOWINN OY) JOJ Wopsesy Jo sooisoq 
(panuyuo)) 


460 


t-ebooks. info 


www.!| 


Index 


a, 94, 240 Cereal box problem, 216 
Acceptance Sampling, 111, 113, 119 Chi-squared distribution, 178, 187, 236 
Addition Theorem, 10 Combinations, 43 see also: binomial coefficients, 44 
AIDS example, 18 Pascal’s identity, 45 
All heads problem, 105 Conditional distributions, 293 
Analysis of variance, 262 Conditional expectation, 295, 303 
Average outgoing quality, 113, 122 Conditional probability, 14 
Average outgoing quality limit, 123 examples, 28, 152 
Average waiting time, 342 Confidence coefficient, 100 
Axioms of probability, 7 Confidence interval 
for 1, 89, 90, 241, 243 
B, 94, 240 for 07, 238 
Banach match book (candy jars) problem, 107 for p, 90, 140 
Bayes’ Theorem, 17 for s, 239 
Behrens-Fisher problem, 250 Consumer’s risk, 121 
Bernoulli random variables, 204 Continuous random variable, 146 
Bernoulli trials, 81, 330 Control chart, 266 
waiting time for patterns, 339 Control limits, 267 
Binomial coefficients, 44 see also: combinations Counting techniques, 39 
Binomial distribution, 81, 213 Counting principles, 34 
approximation to hypergeometric, 115 Coupon Collector’s problem, 216, 362 
examples, 116 Contour plots, 310 
mean, 103 Cook’s Distance, 374 
normal approximation, 175, 330 Correlation coefficient, 298, 300 
Poisson approximation, 132 Covariance, 299 
probability generating function, 207, 213 Critical region, 93 
recursion, 82 Cumulative distribution function, 68, 148, 161 
sums, 204 examples, 69, 144 
variance, 84, 103 properties, 70, 150 
Binomial theorem, 44 
Birthday problem, 284 Derangements, 41, 49 
Bivariate normal distribution, 308 Difference equation (recursion), 44 
Bivariate random variable, 288 solution, 322 
Bootstrap, 375 Disjoint events, 16 
Distribution function, 68, 149 
Candy jars (Banach match book) problem, 107 conditional, 294 
Cauchy distribution, 153, 203 example, 69 
Central limit theorem, 229 exponential, 161 
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Distribution function (continued) 
F, 251 
Marginal, 283 
properties, 70 
Student’s t, 242 
Weibull, 184 
Double sampling, 124 


Estimator, 89 
interval, 89, 90, 241, 243 
least squares, 258 
point, 89 
unbiased, 98 
Events, 7, 12 
disjoint, 16 
examples, 12 
independent, 22, 23, 24 
mutually exclusive, 9, 13 
Expected value 
bivariate, 298 
conditional, 303 
discrete, 72 
examples, 72 
of a function, 199 
geometric, 102 
hypergeometric, 111 
Poisson, 300 
of sums, 203 
Uniform, 157 
Experiments, 2 
examples, 2 
Exponential distribution, 159 
mean, 160 
memoryless property, 160 
variance, 160 


F distribution, 251 

Fibonacci sequence, 5 

Fixed vector, 346 

Functions of random variables, 195 
products, 314 
quotients, 315 
sums, 203, 312 


Gamma distribution, 178 
mean, 180 
moment generating function, 218 
Variance, 180 

Generating functions, 207, 341 
binomial, 213 
from recursions, 339, 341 
geometric, 215 
means and variances from, 343 


moment, 218 
probability, 213, 215 
properties, 211, 223 

Geometric random variable, 68, 74, 102, 366 
expected value, 74 
probability generating function, 216 
variance, 77 

General Addition Law, 46 


HIV example, 18 
Hazard Rate, 163 
Weibull, 184 
Huygens problem, 384 
Hypergeometric random variable, 111, 114, 
128 
binomial approximation, 115 
mean, 114 
moment generating function, 218 
variance, 114 
Hypothesis, 93, 240 
alternative, 93 
composite, 95 
null, 93 
Hypothesis testing 
binomial, 96 
on p, 240 
on two means, 248 
on two variances, 251 


Inclusion and exclusion, 12 
Independence, 299 
Indicator random variables, 302 
Independent events, 22, 23, 24 

examples, 24 
Intersection of sets, 9 
Interval estimator, see confidence interval 


Joint probability distributions, 283 
Ken—Ken, 50 


Law of total probabililty, 16 
Least squares, 258 

Linear model, 258 

Linear regression, 258 
Lottery, 128 


Marginal distributions, 286 
Markov chains, 344 
absorbing states, 349 
Matching problem, 39 
Matrix, 344 
fundamental, 344, 350 
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stochastic, 345 

transition, 344 
Maximum, 48, 198 

sampling distribution, 48 
Mean 

binomial, 84 


candy jars (Banach match book) problem, 107 


control chart, 266 
exponential, 160 
gamma, 178 
geometric, 215 
hypergeometric, 111 
negative binomial, 102 
normal, 166 
Poisson, 131 
properties, 151 
sampling distribution, 229 
sample proportion, 98 
uniform, 57 
Weibull, 184 
Median, 48 
race car example, 48 
Moment generating function, 218 
exponential, 220 
gamma, 225 
normal, 225 
properties, 218 
sums, 224 
uniform, 219 
Moments, 218 
Monte Hall example, 21 
Mowing the lawn, 31 
Multiplication theorem, 14 
Mutually exclusive events, 9, 23 


Negative binomial distribution, 102 
mean, 102 
variance, 102 

Normal distribution, 66, 166 
approximation to binomial, 175 
bivariate, 308 
mean, 169 
moment generating function, 223 
sums, 225 
variance, 169 


Operating characteristic curve, 120 


p values, 243 

Pascal’s identity, 45 

Patterns in Bernoulli trials, 339 
Permutations, 40 

Poisson process, 134 
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Poisson random variable, 130 
approximation to binomial, 132 
mean, 131 
variance, 131 

Poisson’s trials, 214 

Probability 
axioms, 8 
conditional, 14 
examples, 28 
law of total, 16 
theorems, 10 

Probability density function, 147 

Probability distribution function, 62 
binomial, 81 
conditional, 293 
discrete uniform, 63 
exponential, 159 
F,251 
gamma, 178 
geometric, 68 
joint, 283 
marginal, 283 
negative binomial, 102 
normal, 166 
Students’ r, 242 
uniform, 157 
Weibull, 184 

Probability generating function, 213 
binomial, 213 
geometric, 215 
properties, 211 
sums, 203, 224 

Producer’s risk, 121 

proportion, 98 
sample, 98 
confidence interval, 100 
distribution, 99 
variance, 99 


Quality control chart for sample means, 266 
Quality control inspector problem, 194 


Race car example, 48 

Random variable, 61 
bivariate, 283 
continuous, 146 
discrete uniform, 63 
expected value, 72 
functions of, 195 
geometric, 68 
hypergeometric, 114 
independent, 299 
indicator, 302 
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Random variable (continued) residual, 261 
negative binomial, 102 total, 261 
Poisson, 130 Systems 
sums, 203, 224 parallel, 35 
variance, 75 series, 34 
Random walk and ruin, 334 
expected duration, 337 Tchebycheff’s inequality, 78 
solution, 334 Theorems 
Recursion, 49, 82, 108, 322 addition, 10 
binomial, 83 Bayes’, 17 


probability, 10 
Transition matrix, 344 
regular, 344 
Type I error, 94, 240 
Type I error, 94, 240 


candy jars (Banach match book) problem, 107 
generating functions from, 131, 341, 215 
hypergeometric, 111 
Poisson, 130 
solving, 326 
using to find means and variances, 329, 343 
Regression, 258, 309, 322 
Reliability, 34, 69, 162, 184 
Weibull distribution, 148 
Residuals, 258 


Unbiased estimator, 235 
Uniform probability distribution 
discrete, 63, 71 
examples, 63 


joint, 283 
Risks mean, 157 
consumer’s, 161 sums, 205 


producer’s, 121 variance, 157 


Union of sets, 9 
Sample proportion, 98 


Sample space, 2 Variance, 75, 157, 234 

examples, 3 binomial, 84 
Sampling by conditioning, 307 

double, 124 candy jars (Banach match book) problem, 107 
Sampling distribution exponential, 160 


gamma, 178 
geometric, 71 
hypergeometric, 111 
negative binomial, 102 


sample mean, 229, 273 
sample median, 48 
sample maximum, 198 
sample variance, 234 


Sampling inspection plans, 112 normal, 166 

Socks problem, 357 of sums, 224, 300 

Standard deviation, 76 Poisson, 131 
distribution, 267 properties, 151 


sample proportion, 98 
sampling distribution, 234 
uniform, 151 
variance, 238 
Weibull, 184 

Vector 
fixed, 347 

Venn diagram, 10, 11 


Stochastic matrix, 345 
Student’s ¢ distribution, 242 
Sums 
loaded dice, 66 
several dice, 208 
two dice, 67 
Sums of random variables, 203, 224 
expected value, 73, 224 
moment generating function, 224 


of binomials, 204 Waiting times in Bernoulli trials, 330, 339 
of exponentials, 225 Waldegrave problem, 378 
of normals, 225 Weibull distribution, 184 
variance, 224 mean, 185 
Sums of squares variance, 185 
regression, 261 Weak law of large numbers, 233 
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